# Arak M. Mathai · Serge B. Provost · Hans J. Haubold

# Multivariate Statistical Analysis in the Real and Complex Domains

Multivariate Statistical Analysis in the Real and Complex Domains

Arak M. Mathai • Serge B. Provost • Hans J. Haubold

# Multivariate Statistical Analysis in the Real and Complex Domains

Arak M. Mathai Mathematics and Statistics McGill University Montreal, Canada

Hans J. Haubold Vienna International Centre UN Office for Outer Space Affairs Vienna, Austria

Serge B. Provost Statistical and Actuarial Sciences The University of Western Ontario London, ON, Canada

ISBN 978-3-030-95863-3 ISBN 978-3-030-95864-0 (eBook) https://doi.org/10.1007/978-3-030-95864-0

© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# **Preface**

Multivariate statistical analysis often proves to be a challenging subject for students. The difficulty arises in part from the reliance on several types of symbols such as subscripts, superscripts, bars, tildes, bold-face characters, lower- and uppercase Roman and Greek letters, and so on. However, resorting to such notations is necessary in order to refer to the various quantities involved such as scalars and matrices either in the real or complex domain. When the first author began to teach courses in advanced mathematical statistics and multivariate analysis at McGill University, Canada, and other academic institutions around the world, he was seeking means of making the study of multivariate analysis more accessible and enjoyable. He determined that the subject could be made simpler by treating mathematical and random variables alike, thus avoiding the distinct notation that is generally utilized to represent random and non-random quantities. Accordingly, all scalar variables, whether mathematical or random, are denoted by lowercase letters and all vector/matrix variables are denoted by capital letters, with vectors and matrices being identically denoted since vectors can be viewed as matrices having a single row or column. As well, variables belonging to the complex domain are readily identified as such by placing a tilde over the corresponding lowercase and capital letters. Moreover, he noticed that numerous formulas expressed in terms of summations, subscripts, and superscripts could be more efficiently represented by appealing to matrix methods. He further observed that the study of multivariate analysis could be simplified by initially delivering a few lectures on Jacobians of matrix transformations and elementary special functions of matrix argument, and by subsequently deriving the statistical density functions as special cases of these elementary functions as is done for instance in the present book for the real and complex matrix-variate gamma and beta density functions. Basic notes in these directions were prepared and utilized by the first author for his lectures over the past decades. The second and third authors then joined him and added their contributions to flesh out this material to full-fledged book form. Many of the notable features that distinguish this monograph from other books on the subject are listed next.

### **Special Features**

**1.** As the title of the book suggests, its most distinctive feature is its development of a parallel theory of multivariate analysis in the complex domain side by side with the corresponding treatment of the real cases. Various quantities involving complex random variables such as Hermitian forms are widely used in many areas of applications such as light scattering, quantum physics, and communication theory, to name a few. A wide readership is expected as, to our knowledge, this is the first book in the area that systematically combines in a single source the real results and their complex counterparts. Students will be able to better grasp the results that are holding in the complex field by relating them to those existing in the real field.

**2.** In order to avoid resorting to an excessive number of symbols to denote scalar, vector, and matrix variables in the real and complex domains, the following consistent notations are employed throughout the book: All real scalar variables, whether mathematical or random, are denoted by lowercase letters and all real vector/matrix variables are denoted by capital letters, a tilde being placed on the corresponding variables in the complex domain.

**3.** Mathematical variables and random variables are treated the same way and denoted by the same type of letters in order to avoid the double notation often utilized to represent random and mathematical variables as well as the potentially resulting confusion. If probabilities are to be attached to every value that a variable takes, then mathematical variables can be construed as degenerate random variables. This simplified notation will enable students from mathematics, physics, and other disciplines to easily understand the subject matter without being perplexed. Although statistics students may initially find this notation somewhat unsettling, the adjustment ought to prove rapid.

**4.** Matrix methods are utilized throughout the book so as to limit the number of summations, subscripts, superscripts, and so on. This makes the representations of the various results simpler and elegant.

**5.** A connection is established between statistical distribution theory of scalar, vector, and matrix variables in the real and complex domains and fractional calculus. This should foster further growth in both of these fields, which may borrow results and techniques from each other.

**6.** Connections of concepts encountered in multivariate analysis to concepts occurring in geometrical probabilities are pointed out so that each area can be enriched by further work in the other one. Geometrical probability problems of random lengths, random areas, and random volumes in the complex domain may not have been developed yet. They may now be tackled by making use of the results presented in this book.

### Preface

**7.** Classroom lecture style is employed as this book's writing style so that the reader has the impression of listening to a lecture upon reading the material.

**8.** The central concepts and major results are followed by illustrative worked examples so that students may easily comprehend the meaning and significance of the stated results. Additional problems are provided as exercises for the students to work out so that the remaining questions they still may have can be clarified.

**9.** Throughout the book, the majority of the derivations of known or original results are innovative and rather straightforward as they rest on simple applications of results from matrix algebra, vector/matrix derivatives, and elementary special functions.

**10.** Useful results on vector/matrix differential operators are included in the mathematical preliminaries for the real case, and the corresponding operators in the complex domain are developed in Chap. 3. They are utilized to derive maximum likelihood estimators of vector/matrix parameters in the real and complex domains in a more straightforward manner than is otherwise the case with the usual lengthy procedures. The vector/matrix differential operators in the complex domain may actually be new whereas their counterparts, the real, case may be found in Mathai (1997) [see Chapter 1, reference list].

**11.** The simplified and consistent notation of d*X* is used to denote the wedge product of the differentials of all functionally independent real scalar variables in *X*, whether *X* is a scalar, a vector, or a square or rectangular matrix, with d*X*˜ being utilized for the corresponding wedge product of differentials in the complex domain.

**12.** Equation numbering is done sequentially chapter/section-wise; for example, (3.5.4) indicates the fourth equation appearing in Sect. 5 of Chap. 3. To make the numbering scheme more concise and descriptive, the section titles, lemmas, theorems, exercises, and equations pertaining to the complex domain will be identified by appending the letter 'a' to the respective section numbers such as (3.5a.4). The notation (i), (ii), *...,* is employed for neighboring equation numbers related to a given derivation.

**13.** References to the previous materials or equation numbers as well as references to subsequent results appearing in the book are kept a minimum. In order to enhance readability, the main notations utilized in each chapter are repeated at the beginning of each one of them. As well, the reader may notice certain redundancies in the statements. These are intentional and meant to make the material easier to follow.

**14.** Due to the presence of numerous parameters, students generally find the subject of factor analysis quite difficult to grasp and apply effectively. Their understanding of the topic should be significantly enhanced by the explicit derivations that are provided, which incidentally are believed to be original.

**15.** Only the basic material in each topic is covered. The subject matter is clearly discussed and several worked examples are provided so that the students can acquire a clear understanding of this primary material. Only the materials used in each chapter are given as reference—mostly the authors' own works. Additional reading materials are listed at the very end of the book. After acquainting themselves with the introductory material presented in each chapter, the readers ought to be capable of mastering more advanced related topics on their own.

Multivariate analysis encompasses a vast array of topics. Even if the very primary materials pertaining to most of these topics were included in a basic book such as the present one, the length of the resulting monograph would be excessive. Hence, certain topics had to be chosen in order to produce a manuscript of a manageable size. The selection of the topics to be included or excluded is authors' own choice, and it is by no means claimed that those included in the book are the most important ones or that those being omitted are not relevant. Certain pertinent topics, such as confidence regions, multiple confidence intervals, multivariate scaling, tests based on arbitrary statistics, and logistic and ridge regressions, are omitted so as to limit the size of the book. For instance, only some likelihood ratio statistics or *λ*-criteria based tests on normal populations are treated in Chap. 6 on tests of hypotheses, whereas the authors could have discussed various tests of hypotheses on parameters associated with the exponential, multinomial, or other populations, as they also have worked on such problems. As well, since results related to elliptically contoured distributions including the spherically symmetric case might be of somewhat limited interest, this topic is not pursued further subsequently to its introduction in Chap. 3. Nevertheless, standard applications such as principal component analysis, canonical correlation analysis, factor analysis, classification problems, multivariate analysis of variance, profile analysis, growth curves, cluster analysis, and correspondence analysis are properly covered.

Tables of percentage points are provided for the normal, chisquare, Student-*t*, and *F* distributions as well as for the null distributions of the statistics for testing the independence and for testing the equality of the diagonal elements given that the population covariance matrix is diagonal, as they are frequently required in applied areas. Numerical tables for other relevant tests encountered in multivariate analysis are readily available in the literature.

This work may be used as a reference book or as a textbook for a full course on multivariate analysis. Potential readership includes mathematicians, statisticians, physicists, engineers, as well as researchers and graduate students in related fields. Chapters 1–8 or sections thereof could be covered in a one- to two-semester course on mathematical statistics or multivariate analysis, while a full course on applied multivariate analysis might

#### Preface

focus on Chaps. 9–15. Readers with little interest in complex analysis may omit the sections whose numbers are followed by an 'a' without any loss of continuity. With this book and its numerous new derivations, those who are already familiar with multivariate analysis in the real domain will have an opportunity to further their knowledge of the subject and to delve into the complex counterparts of the results.

The authors wish to thank the following former students of the Centre for Mathematical and Statistical Sciences, India, for making use of a preliminary draft of portions of the book for their courses and communicating their comments: Dr. T. Princy, Cochin University of Science and Technology, Kochi, Kerala, India; Dr. Nicy Sebastian, St. Thomas College, Calicut University, Thrissur, Kerala, India; and Dr. Dilip Kumar, Kerala University, Trivandrum, India. The authors also wish to express their thanks to Dr. C. Satheesh Kumar, Professor of Statistics, University of Kerala, and Dr. Joby K. Jose, Professor of Statistics, Kannur University, for their pertinent comments on the second drafts of the chapters. The authors have no conflict of interest to declare. The second author would like to acknowledge the financial support of the Natural Sciences and Engineering Research Council of Canada.

Montreal, ON, Canada Arak M. Mathai London, ON, Canada Serge B. Provost Vienna, Austria Hans J. Haubold July 1, 2022















# **List of Symbols**



*Wp(m, )* : real Wishart density, Sect. 5.5 336 *W*˜ *p(m, )* : complex Wishart density, Sect. 5.5a 336 **X** : sample matrix, Sect. 5.5 343 **X**¯ : matrix of sample means, Sect. 5.5 343 *S* : sample sum of products matrix, Sect. 5.5 343 *r, ρ*: sample and population correlation coefficients, Sect. 5.6 356 *ρ*1*.(*2*...p)* : multiple correlation coefficient, Sect. 5.6 358 *r*1*.(*2*...p)* : sample multiple correlation coefficient, Sect. 5.6 358 *ρ*12*.(*3*...p), r*12*.(*3*...p)*: partial correlations, Sect. 5.6 365 *K*−*<sup>α</sup>* <sup>2</sup>*,U,γ f* : real Erdelyi-Kober fractional integral, second kind, Sect. ´ 5.7 368 *K*−*<sup>α</sup>* <sup>1</sup>*,U,γ f* : real Erdelyi-Kober fractional integral, first kind, Sect. ´ 5.7 371 *K*˜ <sup>−</sup>*<sup>α</sup>* <sup>2</sup>*,U,γ f*˜ : complex Erdelyi-Kober fractional integral, second kind, Sect. ´ 5.7a 375 *L* : likelihood function, Sect. 6.1 396 *Ho* : null hypothesis, Sect. 6.1 396 *H*<sup>1</sup> : alternate hypothesis, Sect. 6.1 396 *χ*2 *m,α* : chisquare tail value, Sect. 6.2 398 *Fm,n,α* : F-tail value, Sect. 6.3 417 *t m,α* : Student-t tail value, Sect. 6.10 470 *<sup>K</sup>* : summation over *K* = *(k*1*,...,kp)*, Sect. 9.6 625 *<sup>r</sup>*1*,...,rp* : summation over *r*1*,...,rp*, Sect. 9.6 626 *C(i*|*j )* : cost function, Sect. 12.1 712 *P r*{*i*|*j,A*} : probability of classification, Sect. 12.2 713 MANOVA: multivariate analysis of variance, Sect. 13.1 759 *xi., x.j , x..* : summation with respect to subscripts, Sect. 13.1 761 *d(Xi, Xj )* : distance measure, Sect. 15.1.2 846

xxvii

#### **Chapter 1 Mathematical Preliminaries**

### **1.1. Introduction**

It is assumed that the reader has had adequate exposure to basic concepts in Probability, Statistics, Calculus and Linear Algebra. This chapter provides a brief review of the results that will be needed in the remainder of this book. No detailed discussion of these topics will be attempted. For essential materials in these areas, the reader is, for instance, referred to Mathai and Haubold (2017a, 2017b). Some properties of vectors, matrices, determinants, Jacobians and wedge product of differentials to be utilized later on, are included in the present chapter. For the sake of completeness, we initially provide some elementary definitions. First, the concepts of vectors, matrices and determinants are introduced.

Consider the consumption profile of a family in terms of the quantities of certain food items consumed every week. The following table gives this family's consumption profile for three weeks:


All the numbers appearing in this table are in kilograms (kg). In Week 1 the family consumed 2 kg of rice, 0.5 kg of lentils, 1 kg of carrots and 2 kg of beans. Looking at the consumption over three weeks, we have an arrangement of 12 numbers into 3 rows and 4 columns. If this consumption profile is expressed in symbols, we have the following representation:

$$A = (a\_{ij}) = \begin{bmatrix} a\_{11} & a\_{12} & a\_{13} & a\_{14} \\ a\_{21} & a\_{22} & a\_{23} & a\_{24} \\ a\_{31} & a\_{32} & a\_{33} & a\_{34} \end{bmatrix} = \begin{bmatrix} 2.00 & 0.50 & 1.00 & 2.00 \\ 1.50 & 0.50 & 0.75 & 1.50 \\ 2.00 & 0.50 & 0.50 & 1.25 \end{bmatrix}$$

where, for example, *a*<sup>11</sup> = 2*.*00*, a*<sup>13</sup> = 1*.*00*, a*<sup>22</sup> = 0*.*50*, a*<sup>23</sup> = 0*.*75*, a*<sup>32</sup> = 0*.*50*, a*<sup>34</sup> = 1*.*25.

**Definition 1.1.1. A matrix** An arrangement of *mn* items into *m* rows and *n* columns is called an *m* by *n (*written as *m* × *n)* matrix.

Accordingly, the above consumption profile matrix is 3 × 4 (3 by 4), that is, it has 3 rows and 4 columns. The standard notation consists in enclosing the *mn* items within round *( )* or square [ ] brackets as in the above representation. The above 3 × 4 matrix is represented in different ways as *A, (aij )* and items enclosed by square brackets. The *mn* items in the *m* × *n* matrix are called *elements of the matrix*. Then, in the above matrix *A, aij* = the *i*-th row, *j* -th column element or the *(i,j)*-th element. In the above illustration, *i* = 1*,* 2*,* 3 (3 rows) and *j* = 1*,* 2*,* 3*,* 4 (4 columns). A general *m* × *n* matrix *A* can be written as follows:

$$A = \begin{bmatrix} a\_{11} & a\_{12} & \dots & a\_{1n} \\ a\_{21} & a\_{22} & \dots & a\_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a\_{m1} & a\_{m2} & \dots & a\_{mn} \end{bmatrix} \tag{1.1.1}$$

The elements are separated by spaces in order to avoid any confusion. Should there be any possibility of confusion, then the elements will be separated by commas. Note that the plural of "matrix" is "matrices". Observe that the position of each element in Table 1.1 has a meaning. The elements cannot be permuted as rearranged elements will give different matrices. In other words, *two m* × *n matrices A* = *(aij ) and B* = *(bij ) are equal if and only if aij* = *bij for all i and j* , *that is, they must be element-wise equal*.

In Table 1.1, the first row, which is also a 1 × 4 matrix, represents this family's first week's consumption. The fourth column represents the consumption of beans over the three weeks' period. Thus, each row and each column in an *m* × *n* matrix has a meaning and represents different aspects. In Eq. (1.1.1), all rows are 1×*n* matrices and all columns are *m* × 1 matrices. *A* 1 × *n matrix is called a row vector and an m* × 1 *matrix is called a column vector*. For example, in Table 1.1, there are 3 row vectors and 4 column vectors. If the row vectors are denoted by *R*1*, R*2*, R*<sup>3</sup> and the column vectors by *C*1*, C*2*, C*3*, C*4*,* then we have

*R*<sup>1</sup> = [2*.*00 0*.*50 1*.*00 2*.*00]*, R*<sup>2</sup> = [1*.*50 0*.*50 0*.*75 1*.*50]*, R*<sup>3</sup> = [2*.*00 0*.*50 0*.*50 1*.*25]

and

$$C\_1 = \begin{bmatrix} 2.00 \\ 1.50 \\ 2.00 \end{bmatrix},\ C\_2 = \begin{bmatrix} 0.50 \\ 0.50 \\ 0.50 \end{bmatrix},\ C\_3 = \begin{bmatrix} 1.00 \\ 0.75 \\ 0.50 \end{bmatrix},\ C\_4 = \begin{bmatrix} 2.00 \\ 1.50 \\ 1.25 \end{bmatrix}.$$

If the total consumption in Week 1 and Week 2 is needed, it is obtained by adding the row vectors element-wise:

$$R\_1 + R\_2 = [2.00 + 1.50 \ 0.50 + 0.50 \ 1.00 + 0.75 \ 2.00 + 1.50] = [3.50 \ 1.00 \ 1.75 \ 3.50].$$

We will define the addition of two matrices in the same fashion as in the above illustration. For the addition to hold, both matrices must be of the same order *m* × *n*. *Let A* = *(aij ) and B* = *(bij ) be two m* × *n matrices. Then the sum, denoted by A* + *B*, *is defined as*

$$A + B = (a\_{ij} + b\_{ij})$$

*or equivalently as the matrix obtained by adding the corresponding elements.* For example,

$$C\_1 + C\_3 = \begin{bmatrix} 2.00 \\ 1.50 \\ 2.00 \end{bmatrix} + \begin{bmatrix} 1.00 \\ 0.75 \\ 0.50 \end{bmatrix} = \begin{bmatrix} 3.00 \\ 2.25 \\ 2.50 \end{bmatrix}.$$

Repeating the addition, we have

$$C\_1 + C\_3 + C\_4 = (C\_1 + C\_3) + \begin{bmatrix} 2.00 \\ 1.50 \\ 1.25 \end{bmatrix} = \begin{bmatrix} 3.00 \\ 2.25 \\ 2.50 \end{bmatrix} + \begin{bmatrix} 2.00 \\ 1.50 \\ 1.25 \end{bmatrix} = \begin{bmatrix} 5.00 \\ 3.75 \\ 3.75 \end{bmatrix}.$$

In general, if *A* = *(aij ), B* = *(bij ), C* = *(cij ), D* = *(dij )* are *m* × *n* matrices, then *A* + *B* + *C* + *D* = *(aij* + *bij* + *cij* + *dij )*, that is, it is the matrix obtained by adding the corresponding elements.

Suppose that in Table 1.1, we wish to express the elements in terms of grams instead of kilograms; then, each and every element therein must be multiplied by 1000. Thus, if *A* is the matrix corresponding to Table 1.1 and *B* is the matrix in terms of grams, we have

$$\begin{aligned} A &= \begin{bmatrix} 2.00 & 0.50 & 1.00 & 2.00 \\ 1.50 & 0.50 & 0.75 & 1.50 \\ 2.00 & 0.50 & 0.50 & 1.25 \end{bmatrix}, \\ B &= \begin{bmatrix} 1000 \times 2.00 & 1000 \times 0.50 & 1000 \times 1.00 & 1000 \times 2.00 \\ 1000 \times 1.50 & 1000 \times 0.50 & 1000 \times 0.75 & 1000 \times 1.50 \\ 1000 \times 2.00 & 1000 \times 0.50 & 1000 \times 0.50 & 1000 \times 1.25 \end{bmatrix}. \end{aligned}$$

We may write this symbolically as *B* = 1000 × *A* = 1000 *A*. Note that 1000 is a 1 × 1 matrix or a scalar quantity. *Any* 1 × 1 *matrix is called a scalar quantity*. Then we may define *scalar multiplication* of a matrix *A* by the scalar quantity *c* as *c A* or *A* = *(aij )* ⇒ *c A* = *(c aij )* or it is obtained by multiplying each and every element of *A* by the scalar quantity *c*. As a convention, *c* is written on the left of *A* as *c A* and not as *A c*. Then, if *c* = −1, then *c A* = *(*−1*)A* = −*A* and *A* + *(*−1*)A* = *A* − *A* = *O* where the capital O denotes a matrix whose elements are all equal to zero. A general *m* × *n* matrix wherein every element is zero is referred to as a *null matrix* and it is written as *O* (not zero). We may also note that if *A, B, C* are *m* × *n* matrices, then *A* + *(B* + *C)* = *(A* + *B)* + *C*. Moreover, *A* + *O* = *O* + *A* = *A*. If *m* = *n*, in which case the number of rows is equal to the number of columns, the resulting matrix is referred to as *a square matrix* because it is a square arrangement of elements; otherwise the matrix is called *a rectangular matrix*. Some special cases of square matrices are the following: For an *n* × *n* matrix or *a square matrix of order n*, suppose that *aij* = 0 for all *i* = *j* (that is, all non-diagonal elements are zeros; here "diagonal" means the diagonal going from top left to bottom right) and if there is at least one nonzero diagonal element, then such a matrix is called *a diagonal matrix* and it is usually written as diag*(d*1*,...,dn)* where *d*1*,...,dn* are the diagonal elements. Here are some examples of 3 × 3 diagonal matrices:

$$D\_1 = \begin{bmatrix} 5 & 0 & 0 \\ 0 & -2 & 0 \\ 0 & 0 & 7 \end{bmatrix}, \ D\_2 = \begin{bmatrix} 4 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix}, \ D\_3 = \begin{bmatrix} a & 0 & 0 \\ 0 & a & 0 \\ 0 & 0 & a \end{bmatrix}, \ a \neq 0.$$

If in *D*3*, a* = 1 so that all the diagonal elements are unities, the resulting matrix is called *an identity matrix* and a diagonal matrix whose diagonal elements are all equal to some number *a* that is not equal to 0 or 1, is referred to as *a scalar matrix*. A square non-null matrix *A* = *(aij )* that contains at least one nonzero element below its leading diagonal and whose elements above the leading diagonal are all equal to zero, that is, *aij* = 0 for all *i<j* , is called a lower triangular matrix. Some examples of 2 × 2 lower triangular matrices are the following:

$$T\_1 = \begin{bmatrix} 5 & 0 \\ 2 & 1 \end{bmatrix},\ T\_2 = \begin{bmatrix} 3 & 0 \\ 1 & 0 \end{bmatrix},\ T\_3 = \begin{bmatrix} 0 & 0 \\ -3 & 0 \end{bmatrix}.$$

If, in a square non-null matrix, all elements below the leading diagonal are zeros and there is at least one nonzero element above the leading diagonal, then such a square matrix is referred to as *an upper triangular matrix*. Here are some examples:

$$T\_1 = \begin{bmatrix} 1 & 2 & -1 \\ 0 & 3 & 1 \\ 0 & 0 & 5 \end{bmatrix},\ T\_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 4 \\ 0 & 0 & 0 \end{bmatrix},\ T\_3 = \begin{bmatrix} 0 & 0 & 7 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}.$$

*Multiplication of Matrices* Once again, consider Table 1.1. Suppose that by consuming 1 kg of rice, the family is getting 700 g (where g represents grams) of starch, 2 g protein and 1 g fat; that by eating 1 kg of lentils, the family is getting 200 g of starch, 100 g of protein and 100 g of fat; that by consuming 1 kg of carrots, the family is getting 100 g of starch, 200 g of protein and 150 g of fat; and that by eating 1 kg of beans, the family is getting 50 g of starch, 100 g of protein and 200 g of fat, respectively. Then the starchprotein-fat matrix, denoted by *B*, is the following where the rows correspond to rice, lentil, carrots and beans, respectively:

$$B = \begin{bmatrix} 700 & 2 & 1 \\ 200 & 100 & 100 \\ 100 & 200 & 150 \\ 50 & 100 & 200 \end{bmatrix}.$$

Let *B*1*, B*2*, B*<sup>3</sup> be the columns of *B*. Then, the first column *B*<sup>1</sup> of *B* represents the starch intake per kg of rice, lentil, carrots and beans respectively. Similarly, the second column *B*<sup>2</sup> represents the protein intake per kg and the third column *B*<sup>3</sup> represents the fat intake, that is,

$$\begin{aligned} \mathcal{B}\_1 = \begin{bmatrix} 700 \\ 200 \\ 100 \\ 50 \end{bmatrix}, \; \mathcal{B}\_2 = \begin{bmatrix} 2 \\ 100 \\ 200 \\ 100 \end{bmatrix}, \; \mathcal{B}\_3 = \begin{bmatrix} 1 \\ 100 \\ 150 \\ 200 \end{bmatrix}. \end{aligned} $$

Let the rows of the matrix *A* in Table 1.1 be denoted by *A*1*, A*<sup>2</sup> and *A*3*,* respectively, so that

*A*<sup>1</sup> = [2*.*00 0*.*50 1*.*00 2*.*00]*, A*<sup>2</sup> = [1*.*50 0*.*50 0*.*75 1*.*50]*, A*<sup>3</sup> = [2*.*00 0*.*50 0*.*50 1*.*25]*.*

Then, the total intake of starch by the family in Week 1 is available from

$$2.00 \times 700 + 0.50 \times 200 + 1.00 \times 100 + 2.00 \times 50 = 1700 \text{g.}$$

This is the sum of the element-wise products of *A*<sup>1</sup> with *B*1. We will denote this by *A*<sup>1</sup> ·*B*<sup>1</sup> (*A*<sup>1</sup> dot *B*1). The total intake of protein by the family in Week 1 is determined as follows:

$$A\_1.B\_2 = 2.00 \times 2 + 0.50 \times 100 + 1.00 \times 200 + 2.00 \times 100 = 454 \text{ g}$$

and the total intake of fat in Week 1 is given by

*A*1*.B*<sup>3</sup> = 2*.*00 × 1 + 0*.*50 × 100 + 1*.*00 × 150 + 2*.*00 × 200 = 602 g*.*

Thus, the dot product of *A*<sup>1</sup> with *B*1*, B*2*, B*<sup>3</sup> provides the intake of starch, protein and fat in Week 1. Similarly, the dot product of *A*<sup>2</sup> with *B*1*, B*2*, B*<sup>3</sup> gives the intake of starch, protein and fat in Week 2. Thus, the configuration of starch, protein and fat intake over the three weeks is

$$AB = \begin{bmatrix} A\_1 \cdot B\_1 & A\_1 \cdot B\_2 & A\_1 \cdot B\_3 \\ A\_2 \cdot B\_1 & A\_2 \cdot B\_2 & A\_2 \cdot B\_3 \\ A\_3 \cdot B\_1 & A\_3 \cdot B\_2 & A\_3 \cdot B\_3 \end{bmatrix}.$$

A matrix having one column and *m* rows is an *m* × 1 matrix that is referred to as *a column vector of m elements or a column vector of order m*. A matrix having one row and *n* column is a 1 × *n* matrix called *a row vector of n components or a row vector of order n*. Let *A* be a row or column vector of order *n*, which consist of *n* elements or components. Let the elements comprising *A* be denoted by *a*1*,...,an*. Let *B* be a row or column vector of order *n* consisting of the elements *b*1*,...,bn*. *Then, the dot product of A and B*, *denoted by A*·*B* = *B*·*A is defined as A*·*B* = *a*1*b*1+*a*2*b*2+···+*anbn so that or the corresponding elements of A and B are multiplied and added up.* Let *A* be an *m*×*n* matrix whose *m* rows are written as *A*1*,...,Am*. Let *B* be another *n* × *r* matrix whose *r* columns are written as *B*1*,...,Br*. Note that the number of columns of *A* is equal to the number of rows of *B*, which in this case is *n*. When the number of columns of *A* is equal to the number of rows of *B*, the product *AB* is defined and equal to

$$AB = \begin{bmatrix} A\_1 \cdot B\_1 & A\_1 \cdot B\_2 & \dots & A\_1 \cdot B\_r \\ A\_2 \cdot B\_1 & A\_2 \cdot B\_2 & \dots & A\_2 \cdot B\_r \\ \vdots & \vdots & \ddots & \vdots \\ A\_m \cdot B\_1 & A\_m \cdot B\_2 & \dots & A\_m \cdot B\_r \end{bmatrix} \text{ with } A = \begin{bmatrix} A\_1 \\ A\_2 \\ \vdots \\ \vdots \\ A\_m \end{bmatrix}, \ B = \begin{bmatrix} B\_1 & B\_2 & \cdots & B\_r \end{bmatrix}, \ \blacksquare$$

the resulting matrix *AB* being of order *m* × *r*. When *AB* is defined, *BA* need not be defined. However, if *r* = *m*, then *BA* is also defined, otherwise not. In other words, if *A* = *(aij )* is *m* × *n* and if *B* = *(bij )* is *n* × *r* and if *C* = *(cij )* = *AB*, then *cij* = *Ai* · *Bj* where *Ai* is the *<sup>i</sup>*-th row of *<sup>A</sup>* and *Bj* is the *<sup>j</sup>* -th column of *<sup>B</sup>* or *cij* <sup>=</sup> *<sup>n</sup> <sup>k</sup>*=<sup>1</sup> *aik bkj* for all *i* and *j* . For example,

$$\begin{aligned} A &= \begin{bmatrix} 1 & -1 & 0 \\ 2 & 3 & 5 \end{bmatrix}, \ B = \begin{bmatrix} 2 & -2 \\ 3 & 2 \\ 1 & 0 \end{bmatrix} \Rightarrow \\ AB &= \begin{bmatrix} (1)(2) + (-1)(3) + (0)(1) & (1)(-2) + (-1)(2) + (0)(0) \\ (2)(2) + (3)(3) + (5)(1) & 2(-2) + 3(2) + (5)(0) \end{bmatrix} = \begin{bmatrix} -1 & -4 \\ 18 & 2 \end{bmatrix}. \end{aligned}$$

Note that, in this case, *BA* is defined and equal to

$$\begin{aligned} BA &= \begin{bmatrix} 2 & -2 \\ 3 & 2 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & -1 & 0 \\ 2 & 3 & 5 \end{bmatrix} \\ &= \begin{bmatrix} (2)(1) + (-2)(2) & (2)(-1) + (-2)(3) & (2)(0) + (-2)(5) \\ (3)(1) + (2)(2) & (3)(-1) + (2)(3) & (3)(0) + (2)(5) \\ (1)(1) + (0)(2) & (1)(-1) + (0)(3) & (1)(0) + (0)(5) \end{bmatrix} \\ &= \begin{bmatrix} -2 & -8 & -10 \\ 7 & 3 & 10 \\ 1 & -1 & 0 \end{bmatrix} .\end{aligned}$$

As another example, let

$$A = \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}, \ B = \begin{bmatrix} 2 & 3 & 5 \end{bmatrix} \Rightarrow \ AB = \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix} \begin{bmatrix} 2 & 3 & 5 \end{bmatrix} = \begin{bmatrix} 2 & 3 & 5 \\ -2 & -3 & -5 \\ 4 & 6 & 10 \end{bmatrix}$$

which is 3 × 3, whereas *BA* is 1 × 1:

$$BA = \begin{bmatrix} 2 & 3 \ 5 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix} = 9.$$

As yet another example, let

$$A = \begin{bmatrix} 2 & 0 & 0 \\ -1 & 1 & 0 \\ 1 & 1 & 1 \end{bmatrix}, \ B = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 1 & -1 & 0 \end{bmatrix}.$$

Note that here both *A* and *B* are lower triangular matrices. The products *AB* and *BA* are defined since both *A* and *B* are 3 × 3 matrices. For instance,

$$\begin{split} \boldsymbol{AB} &= \begin{bmatrix} (2)(1) + (0)(0) + (0)(1) & (2)(0) + (0)(2) + (0)(-1) & (2)(0) + (0)(0) + (0)(0) \\ (-1)(1) + (1)(0) + (0)(1) & (-1)(0) + (1)(2) + (0)(-1) & (-1)(0) + (1)(0) + (0)(0) \\ (1)(1) + (1)(0) + (1)(1) & (1)(0) + (1)(2) + (1)(-1) & (1)(0) + (1)(0) + (1)(0) \end{bmatrix} \\ &= \begin{bmatrix} 2 & 0 & 0 \\ -1 & 2 & 0 \\ 2 & 1 & 0 \end{bmatrix} .\end{split}$$

Observe that since *A* and *B* are lower triangular, *AB* is also lower triangular. Here are some general properties of products of matrices: When the product of the matrices is defined, in which case we say that they are conformable for multiplication,

(1): *the product of two lower triangular matrices is lower triangular*;

(2): *the product of two upper triangular matrices is upper triangular*;

(3): *the product of two diagonal matrices is diagonal*;

(4): *if A is m* × *n*, *IA* = *A where I* = *Im is an identity matrix, and A In* = *A*;

(5): *OA* = *O whenever OA is defined, and A O* = *O whenever A O is defined*.

*Transpose of a Matrix* A matrix whose rows are the corresponding columns of *A* or, equivalently, a matrix whose columns are the corresponding rows of *A* is called the transpose of *A* denoted as *A*-(*A* prime). For example,

$$A\_1 = \begin{bmatrix} 1 & 2 & -1 \end{bmatrix} \Rightarrow A\_1' = \begin{bmatrix} 1 \\ 2 \\ -1 \end{bmatrix}, \ A\_2 = \begin{bmatrix} 1 & 0 \\ 1 & 1 \end{bmatrix} \Rightarrow A\_2' = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix};$$

$$A\_3 = \begin{bmatrix} 1 & 3 \\ 3 & 7 \end{bmatrix} \Rightarrow A\_3' = \begin{bmatrix} 1 & 3 \\ 3 & 7 \end{bmatrix} = A\_3; \ A\_4 = \begin{bmatrix} 0 & 5 \\ -5 & 0 \end{bmatrix} \Rightarrow A\_4' = \begin{bmatrix} 0 & -5 \\ 5 & 0 \end{bmatrix} = -A\_4.$$

Observe that *A*<sup>2</sup> is lower triangular and *A*- <sup>2</sup> is upper triangular, that *A*- <sup>3</sup> = *A*3, and that *A*- <sup>4</sup> = −*A*4. Note that if *A* is *m* × *n*, then *A* is *n* × *m*. If *A* is 1 × 1 then *A* is the same scalar (1 × 1) quantity. If *A* is a square matrix and *A*- = *A*, then *A is called a symmetric matrix*. If *B* is a square matrix and *B*- = −*B,* then *B* is called a *skew symmetric matrix*. Within a skew symmetric matrix *B* = *(bij )*, a diagonal element must satisfy the equation *b*- *jj* = −*bjj ,* which necessitates that *bjj* = 0*,* whether *B* be real or complex. Here are some properties of the transpose: The transpose of a lower triangular matrix is upper triangular; the transpose of an upper triangular matrix is lower triangular; the transpose of a diagonal matrix is diagonal; the transpose of an *m* × *n* null matrix is an *n* × *m* null matrix;

$$(A')' = A; \ (AB)' = B'A'; \ (A\_1A\_2\cdots A\_k)' = A'\_k\cdots A'\_2A'\_1; \ (A+B)' = A'+B'$$

whenever *AB, A* + *B,* and *A*<sup>1</sup> *A*<sup>2</sup> ··· *Ak* are defined.

*Trace of a Square Matrix* The trace is defined only for square matrices. Let *A* = *(aij )* be an *n* × *n* matrix whose leading diagonal elements are *a*11*, a*22*,...,ann*; then the trace of *A*, denoted by tr*(A)* is defined as tr*(A)* = *a*<sup>11</sup> + *a*<sup>22</sup> +···+ *ann*, that is, the sum of the elements comprising the leading diagonal. The following properties can directly be deduced from the definition. Whenever *AB* and *BA* are defined, tr*(AB)* = tr*(BA)* where *AB* need not be equal to *BA*. If *A* is *m* × *n* and *B* is *n* × *m*, then *AB* is *m* × *m* whereas *BA* is *n* × *n*; however, the traces are equal, that is, tr*(AB)* = tr*(BA)*, which implies that tr*(ABC)* = tr*(BCA)* = tr*(CAB)*.

*Length of a Vector* Let *V* be a *n* × 1 real column vector or a 1 × *n* real row vector then *V* can be represented as a point in *n*-dimensional Euclidean space when the elements are real numbers. Consider a 2-dimensional vector (or 2-vector) with the elements *(*1*,* 2*)*. Then, this vector corresponds to the point depicted in Fig. 1.1.

**Figure 1.1** The point *P* = *(*1*,* 2*)* in the plane

Let *O* be the origin and *P* be the point. Then, the length of the resulting vector is the Euclidean distance between *O* and *P*, that is, + *(*1*)*<sup>2</sup> <sup>+</sup> *(*2*)*<sup>2</sup> = +√5. Let *<sup>U</sup>* <sup>=</sup> *(u*1*,...,un)* be a real *n*-vector, either written as a row or a column. *Then the length of U*, *denoted by U is defined as follows*:

$$\|U\| = +\sqrt{\mu\_1^2 + \dots + \mu\_n^2}$$

*whenever the elements u*1*,...,un are real. If u*1*,...,un are complex numbers then U* =  |*u*1| <sup>2</sup> +···+|*un*| <sup>2</sup> *where* |*uj* | *denotes the absolute value or modulus of uj* . If *uj* <sup>=</sup> *aj* <sup>+</sup>*ibj ,* with *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)* and *aj , bj* real, then <sup>|</sup>*uj* |=+ *(a*<sup>2</sup> *<sup>j</sup>* <sup>+</sup> *<sup>b</sup>*<sup>2</sup> *<sup>j</sup> )*. If the length of a vector is unity, that vector is called *a unit vector*. For example, *e*<sup>1</sup> = *(*1*,* 0*,...,* 0*), e*<sup>2</sup> = *(*0*,* 1*,* 0*,...,* 0*), . . . , en* = *(*0*,...,* 0*,* 1*)* are all unit vectors. As well, *V*<sup>1</sup> = *(*<sup>√</sup> 1 2 *,* √ 1 2 *)* and *V*<sup>2</sup> = *(*<sup>√</sup> 1 6 *,* √−<sup>2</sup> 6 *,* √ 1 6 *)* are unit vectors. If two *n*-vectors, *U*<sup>1</sup> and *U*2, are such that *U*<sup>1</sup> ·*U*<sup>2</sup> = 0, that is, their dot product is zero, then the two vectors are said to be *orthogonal to each other*. For example, if *U*<sup>1</sup> = *(*1*,* 1*)* and *U*<sup>2</sup> = *(*1*,* −1*),* then *U*<sup>1</sup> · *U*<sup>2</sup> = 0 and *U*<sup>1</sup> and *U*<sup>2</sup> are orthogonal to each other; similarly, if *U*<sup>1</sup> = *(*1*,* 1*,* 1*)* and *U*<sup>2</sup> = *(*1*,* −2*,* 1*)*, then *U*<sup>1</sup> · *U*<sup>2</sup> = 0 and *U*<sup>1</sup> and *U*<sup>2</sup> are orthogonal to each other. If *U*1*,...,Uk* are *k* vectors, each of order *n*, all being either row vectors or column vectors, and if *Ui* · *Uj* = 0 for all *i* = *j* , that is, all distinct vectors are orthogonal to each other, then we say that *U*1*,...,Uk* forms *an orthogonal system of vectors*. In addition, if the length of each vector is unity, *Uj* = 1*, j* = 1*,...,k*, then we say that *U*1*,...,Uk* is *an orthonormal system of vectors*. If a matrix *A* is real and its rows and its columns form an orthonormal system, then *A* is called *an orthonormal matrix*. In this case, *AA*- = *In* and *A*- *A* = *In*; accordingly, any square matrix *A* of real elements such that *AA*- = *In* and *A*- *A* = *In is referred to as an orthonormal matrix*. If only one equation holds, that is, *B* is a real matrix such that either *BB*- = *I,B*- *B* = *I* or *B*- *B* = *I,BB*- = *I* , then *B* is called *a semiorthonormal matrix*. For example, consider the matrix

$$A = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \end{bmatrix}; \text{ then } AA' = I\_2, \ A'A = I\_2,$$

and *A* is an orthonormal matrix. As well,

$$A = \begin{bmatrix} \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{2}} & 0 & -\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{6}} & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{6}} \end{bmatrix} \Rightarrow AA' = I\_3, \ A'A = I\_3,$$

and *A* here is orthonormal. However,

$$B = \begin{bmatrix} \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{2}} & 0 & -\frac{1}{\sqrt{2}} \end{bmatrix} \Rightarrow B B' = I\_2, \ B' B \neq I\_1$$

so that *B* is semiorthonormal. On deleting some rows from an orthonormal matrix, we obtain a semiorthonormal matrix such that *BB*- = *I* and *B*- *B* = *I* . Similarly, if we delete some of the columns, we end up with a semiorthonormal matrix such that *B*- *B* = *I* and *BB*- = *I* .

*Linear Independence of Vectors* Consider the vectors *U*<sup>1</sup> = *(*1*,* 1*,* 1*), U*<sup>2</sup> = *(*1*,* −2*,* 1*), U*<sup>3</sup> = *(*3*,* 0*,* 3*)*. Then, we can easily see that *U*<sup>3</sup> = 2*U*<sup>1</sup> + *U*<sup>2</sup> = 2*(*1*,* 1*,* 1*)* + *(*1*,* −2*,* 1*)* = *(*3*,* 0*,* 3*)* or *U*3−2*U*1−*U*<sup>2</sup> = *O* (a null vector). In this case, one of the vectors can be written as a linear function of the others. Let *V*<sup>1</sup> = *(*1*,* 1*,* 1*), V*<sup>2</sup> = *(*1*,* 0*,* −1*), V*<sup>3</sup> = *(*1*,* −2*,* 1*)*. Can any one of these be written as a linear function of others? If that were possible, then there would exist a linear function of *V*1*, V*2*, V*<sup>3</sup> that is equal to is a null vector. Let us consider the equation *a*1*V*<sup>1</sup> +*a*2*V*<sup>2</sup> +*a*3*V*<sup>3</sup> = *(*0*,* 0*,* 0*)* where *a*1*, a*2*, a*<sup>3</sup> are scalars where at least one of them is nonzero. Note that *a*<sup>1</sup> = 0*, a*<sup>2</sup> = 0 and *a*<sup>3</sup> = 0 will always satisfy the above equation. Thus, our question is whether *a*<sup>1</sup> = 0*, a*<sup>2</sup> = 0*, a*<sup>3</sup> = 0 is the only solution.

$$a\_1V\_1 + a\_2V\_2 + a\_3V\_3 = O \Rightarrow a\_1(1,1,1) + a\_2(1,0,-1) + a\_3(1,-2,1) = (0,0,0)$$

$$\Rightarrow a\_1 + a\_2 + a\_3 = 0 \text{ (i);} \;\ a\_1 - 2a\_3 = 0 \text{ (ii);} \;\ a\_1 - a\_2 + a\_3 = 0. \tag{iii}$$

From (*ii*), *a*<sup>1</sup> = 2*a*3. Then, from (*iii*), 3*a*<sup>3</sup> − *a*<sup>2</sup> = 0 ⇒ *a*<sup>2</sup> = 3*a*3; then from (*i*), 2*a*<sup>3</sup> + 3*a*<sup>3</sup> + *a*<sup>3</sup> = 0 or 6*a*<sup>3</sup> = 0 or *a*<sup>3</sup> = 0. Thus, *a*<sup>2</sup> = 0*, a*<sup>1</sup> = 0 and there is no nonzero *a*<sup>1</sup> or *a*<sup>2</sup> or *a*<sup>3</sup> satisfying the equation and hence *V*1*, V*2*, V*<sup>3</sup> cannot be linearly dependent; so, they are linearly independent. Hence, we have the following definition: Let *U*1*,...,Uk* be *k* vectors, each of order *n*, all being either row vectors or column vectors, so that addition and linear functions are defined. Let *a*1*,...,ak* be scalar quantities. Consider the equation

$$a\_1U\_1 + a\_2U\_2 + \cdots + a\_kU\_k = O \text{ (a null vector)}.\tag{iv}$$

If *a*<sup>1</sup> = 0*, a*<sup>2</sup> = 0*,...,ak* = 0 is the only solution to *(iv),* then *U*1*,...,Uk are linearly independent, otherwise they are linearly dependent*. If they are linearly dependent, then at least one of the vectors can be expressed as a linear function of others. The following properties can be established from the definition: Let *U*1*,...,Uk* be *n*-vectors, *k* ≤ *n*.

(1) If *U*1*,...,Uk* are mutually orthogonal, then they are linearly independent, that is, if *Ui* · *Uj* = 0, for all *i* = *j,* then *U*1*,...,Uk* are linearly independent;

(2) There cannot be more than *n* mutually orthogonal *n*-vectors;

(3) There cannot be more than *n* linearly independent *n*-vectors.

*Rank of a Matrix* The maximum number of linearly independent row vectors of a *m* × *n* matrix is called the *row rank of the matrix*; the maximum number of linearly independent column vectors is called the *column rank of the matrix*. It can be shown that the row rank of any matrix is equal to its column rank, and this common rank is called the *rank of the matrix*. If *r* is the rank of a *m* × *n* matrix, then *r* ≤ *m* and *r* ≤ *n*. If *m* ≤ *n* and the rank is *m* or if *n* ≤ *m* and the rank is *n*, then the matrix is called *a full rank matrix*. A square matrix of full rank is called *a nonsingular matrix*. When the rank of an *n* × *n* matrix is *r<n*, this matrix is referred to as *a singular matrix*. Singularity is defined only for square matrices. The following properties clearly hold:

(1) *A diagonal matrix with at least one zero diagonal element is singular or a diagonal matrix with all nonzero diagonal elements is nonsingular*;

(2) *A triangular matrix (upper or lower) with at least one zero diagonal element is singular or a triangular matrix with all diagonal elements nonzero is nonsingular*;

(3) *A square matrix containing at least one null row vector or at least one null column vector is singular*;

(4) *Linear independence or dependence in a collection of vectors of the same order and category (either all are row vectors or all are column vectors) is not altered by multiplying any of the vectors by a nonzero scalar*;

(5) *Linear independence or dependence in a collection of vectors of the same order and category is not altered by adding any vector of the set to any other vector in the same set*; (6) *Linear independence or dependence in a collection of vectors of the same order and category is not altered by adding a linear combination of vectors from the same set to any other vector in the same set*;

(7) *If a collection of vectors of the same order and category is a linearly dependent system, then at least one of the vectors can be made null by the operations of scalar multiplication and addition*.

*Note*: We have defined "vectors" as an ordered set of items such as an ordered set of numbers. One can also give a general definition of a vector as an element in a set *S* which is closed under the operations of scalar multiplication and addition (these operations are to be defined on *S*), that is, letting *S* be a set of items, if *V*<sup>1</sup> ∈ *S* and *V*<sup>2</sup> ∈ *S*, then *cV*<sup>1</sup> ∈ *S* and *V*<sup>1</sup> + *V*<sup>2</sup> ∈ *S* for all scalar *c* and for all *V*<sup>1</sup> and *V*2, that is, if *V*<sup>1</sup> is an element in *S*, then *cV*<sup>1</sup> is also an element in *S* and if *V*<sup>1</sup> and *V*<sup>2</sup> are in *S*, then *V*<sup>1</sup> + *V*<sup>2</sup> is also in *S*, where operations *c V*<sup>1</sup> and *V*<sup>1</sup> + *V*<sup>2</sup> are to be properly defined. One can impose additional conditions on *S*. However, for our discussion, the notion of vectors as ordered set of items will be sufficient.

#### **1.2. Determinants**

Determinants are defined only for square matrices. They are certain scalar functions of the elements of the square matrix under consideration. We will motivate this particular function by means of an example that will also prove useful in other areas. Consider two 2-vectors, either both row vectors or both column vectors. Let *U* = *OP* and *V* = *OQ* be the two vectors as shown in Fig. 1.2. If the vectors are separated by a nonzero angle *θ* then one can create the parallelogram *OPSQ* with these two vectors as shown in Fig. 1.2.

**Figure 1.2** Parallelogram generated from two vectors

The area of the parallelogram is twice the area of the triangle *OPQ*. If the perpendicular from *P* to *OQ* is *P R*, then the area of the triangle is <sup>1</sup> <sup>2</sup>*P R* × *OQ* or the area of the parallelogram *OPSQ* is *P R* × *V* where *P R* is *OP* × sin *θ* = *U* × sin *θ*. Therefore the area is *U V* sin *θ* or the area, denoted by *ν* is

$$\nu = \|U\| \, \|V\| \sqrt{(1 - \cos^2 \theta)}.$$

If *θ*<sup>1</sup> is the angle *U* makes with the *x*-axis and *θ*2*,* the angle *V* makes with the *x*-axis, then if *U* and *V* are as depicted in Fig. 1.2, then *θ* = *θ*<sup>1</sup> − *θ*2. It follows that

$$
\cos \theta = \cos(\theta\_1 - \theta\_2) = \cos \theta\_1 \cos \theta\_2 + \sin \theta\_1 \sin \theta\_2 = \frac{U \cdot V}{\|U\| \|V\|},
$$

as can be seen from Fig. 1.2. In this case,

$$\nu = \|U\| \, \|V\| \sqrt{1 - \left(\frac{(U \cdot V)}{\|U\| \, \|V\|}\right)^2} = \sqrt{(\|U\|)^2 \left(\|V\|\right)^2 - (U \cdot V)^2} \tag{1.2.1}$$

and

$$\nu^2 = (\|U\|)^2 (\|V\|)^2 - (U \cdot V)^2. \tag{1.2.2}$$

This can be written in a more convenient way. Letting *X* = *U V* ,

$$XX'=\begin{pmatrix} U \\ V \end{pmatrix} \begin{pmatrix} U' \ V' \end{pmatrix} = \begin{bmatrix} UU' & UV' \\ VU' & VV' \end{bmatrix} = \begin{bmatrix} U \cdot U & U \cdot V \\ V \cdot U & V \cdot V \end{bmatrix}.\tag{1.2.3}$$

On comparing (1.2.2) and (1.2.3), we note that (1.2.2) is available from (1.2.3) by taking a scalar function of the following type. Consider a matrix

$$C = \begin{bmatrix} a & b \\ c & d \end{bmatrix}; \text{ then (1.2.2) is available by taking } ad - bc$$

where *a, b, c, d* are scalar quantities. A scalar function of this type is the determinant of the matrix *C*.

A general result can be deduced from the above procedure: If *U* and *V* are *n*-vectors and if *θ* is the angle between them, then

$$\cos \theta = \frac{U \cdot V}{\|U\| \|V\|}$$

or the dot product of *U* and *V* divided by the product of their lengths when *θ* = 0*,* and the numerator is equal to the denominator when *θ* = 2*nπ, n* = 0*,* 1*,* 2*,...* . We now provide a formal definition of the determinant of a square matrix.

**Definition 1.2.1. The Determinant of a Square Matrix** Let *A* = *(aij )* be a *n*×*n* matrix whose rows (columns) are denoted by *α*1*,...,αn*. For example, if *αi* is the *i*-th row vector, then

$$\alpha\_i = (a\_{i1} \ a\_{i2} \ \dots \ a\_{in})\ .$$

The determinant of *A* will be denoted by |*A*| or det*(A)* when *A* is real or complex and the absolute value of the determinant of *A* will be denoted by |det*(A)*| when *A* is in the complex domain. Then, |*A*| will be a function of *α*1*,...,αn*, written as

$$|A| = \det(A) = f(\alpha\_1, \dots, \alpha\_i, \dots, \alpha\_j, \dots, \alpha\_n),$$

which will be defined by the following four axioms (postulates or assumptions): (this definition also holds if the elements of the matrix are in the complex domain)

$$(1)\qquad\qquad f(\alpha\_1,\ldots,c\,\alpha\_i,\ldots,\alpha\_n) = cf(\alpha\_1,\ldots,\alpha\_i,\ldots,\alpha\_n),$$

which is equivalent to saying that if any row (column) is multiplied by a scalar quantity *c* (including zero), then the whole determinant is multiplied by *c*;

$$f(\mathcal{Q}) \qquad f(a\_1, \ldots a\_l, \ldots, a\_l + a\_j, \ldots, a\_n) = f(a\_1, \ldots, a\_l, \ldots, a\_j, \ldots, a\_n),$$

which is equivalent to saying that if any row (column) is added to any other row (column), then the value of the determinant remains the same;

$$f(\mathfrak{J}) \quad f(\mathfrak{a}\_1, \dots, \mathfrak{y}\_l + \delta\_l, \dots, \mathfrak{a}\_{\mathfrak{n}}) = f(\mathfrak{a}\_1, \dots, \mathfrak{y}\_l, \dots, \mathfrak{a}\_{\mathfrak{n}}) + f(\mathfrak{a}\_1, \dots, \delta\_l, \dots, \mathfrak{a}\_{\mathfrak{n}}),$$

which is equivalent to saying that if any row (column), say the *i*-th row (column) is written as a sum of two vectors, *αi* = *γi* + *δi* then the determinant becomes the sum of two determinants such that *γi* appears at the position of *αi* in the first one and *δi* appears at the position of *αi* in the second one;

$$(4) \tag{4}$$

$$f(e\_1, \dots, e\_n) = 1$$

where *e*1*,...,en* are the basic unit vectors as previously defined; this axiom states that the determinant of an identity matrix is 1.

Let us consider some corollaries resulting from Axioms (1) to (4). On combining Axioms (1) and (2), we have that the value of a determinant remains unchanged if a linear function of any number of rows (columns) is added to any other row (column). As well, the following results are direct consequences of the axioms.

(i): *The determinant of a diagonal matrix is the product of the diagonal elements* [which can be established by repeated applications of Axiom (1)];

(ii): *If any diagonal element in a diagonal matrix is zero, then the determinant is zero, and thereby the corresponding matrix is singular; if none of the diagonal elements of a diagonal matrix is equal to zero, then the matrix is nonsingular*.

(iii): *If any row (column) of a matrix is null, then the determinant is zero or the matrix is singular* [Axiom (1)];

(iv): *If any row (column) is a linear function of other rows (columns), then the determinant is zero* [By Axioms (1) and (2), we can reduce that row (column) to a null vector]. Thus, the determinant of a singular matrix is zero or if the row (column) vectors form a linearly dependent system, then the determinant is zero.

By using Axioms (1) and (2), we can reduce a triangular matrix to a diagonal form when evaluating its determinant. For this purpose we shall use the following standard notation: "*c (i)* + *(j )* ⇒" means "*c* times the *i*-th row is added to the *j* -th row which results in the following:" Let us consider a simple example. Consider a triangular matrix and its determinant. Evaluate the determinant of the following matrix:

$$T = \begin{bmatrix} 2 & 1 & \mathbf{S} \\ 0 & 3 & 4 \\ 0 & 0 & -4 \end{bmatrix}.$$

It is an upper triangular matrix. We take out −4 from the third row by using Axiom (1). Then,

$$|T| = -4\begin{vmatrix} 2 & 1 & 5 \\ 0 & 3 & 4 \\ 0 & 0 & 1 \end{vmatrix}.$$

Now, add *(*−4*)* times the third row to the second row and *(*−5*)* times the third row to the first row. This in symbols is "−4*(*3*)* + *(*2*),* −5*(*3*)* + *(*1*)* ⇒". The net result is that the elements 5 and 4 in the last column are eliminated without affecting the other elements, so that

$$|T| = -4\begin{vmatrix} 2 & 1 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 1 \end{vmatrix}.$$

Now take out 3 from the second row and then use the second row to eliminate 1 in the first row. After taking out 3 from the second row, the operation is "−1*(*2*)* + *(*1*)* ⇒". The result is the following:

$$|T| = (-4)(3)\begin{vmatrix} 2 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{vmatrix} .$$

Now, take out 2 from the first row, then by Axiom (4) the determinant of the resulting identity matrix is 1, and hence |*T* | is nothing but the product of the diagonal elements. Thus, we have the following result:

(v): *The determinant of a triangular matrix (upper or lower) is the product of its diagonal elements; accordingly, if any diagonal element in a triangular matrix is zero, then the determinant is zero and the matrix is singular*. For a triangular matrix to be nonsingular, all its diagonal elements must be non-zeros.

The following result follows directly from Axioms (1) and (2). The proof is given in symbols.

(vi): *If any two rows (columns) are interchanged (this means one transposition), then the resulting determinant is multiplied by* −1 *or every transposition brings in* −1 *outside that determinant as a multiple*. If an odd number of transpositions are done, then the whole determinant is multiplied by −1, and for even number of transpositions, the multiplicative factor is +1 or no change in the determinant. An outline of the proof follows:

$$\begin{split} |A| &= f(\alpha\_1, \dots, \alpha\_i, \dots, \alpha\_j, \dots, \alpha\_n) \\ &= f(\alpha\_1, \dots, \alpha\_i, \dots, \alpha\_i + \alpha\_j, \dots, \alpha\_n) \quad \text{[Axiom (2)]} \\ &= -f(\alpha\_1, \dots, \alpha\_i, \dots, -\alpha\_i - \alpha\_j, \dots, \alpha\_n) \quad \text{[Axiom (1)]} \\ &= -f(\alpha\_1, \dots, -\alpha\_j, \dots, -\alpha\_i - \alpha\_j, \dots, \alpha\_n) \quad \text{[Axiom (2)]} \\ &= f(\alpha\_1, \dots, \alpha\_j, \dots, -\alpha\_i - \alpha\_j, \dots, \alpha\_n) \quad \text{[Axiom (1)]} \\ &= f(\alpha\_1, \dots, \alpha\_j, \dots, -\alpha\_i, \dots, \alpha\_n) \quad \text{[Axiom (2)]} \\ &= -f(\alpha\_1, \dots, \alpha\_j, \dots, \alpha\_i, \dots, \alpha\_n) \quad \text{[Axiom (1)]} \end{split}$$

Now, note that the *i*-th and *j* -th rows (columns) are interchanged and the result is that the determinant is multiplied by −1.

With the above six basic properties, we are in a position to evaluate most of the determinants.

**Example 1.2.1.** Evaluate the determinant of the matrix

$$A = \begin{bmatrix} 2 & 0 & 0 & 0 \\ 1 & 5 & 0 & 0 \\ 2 & -1 & 1 & 0 \\ 3 & 0 & 1 & 4 \end{bmatrix}.$$

**Solution 1.2.1.** Since, this is a triangular matrix, its determinant will be product of its diagonal elements. Proceeding step by step, take out 2 from the first row by using Axiom (1). Then −1*(*1*)* + *(*2*),* −2*(*1*)* + *(*3*),* −3*(*1*)* + *(*3*)* ⇒. The result of these operations is the following:

$$|A| = 2 \begin{vmatrix} 1 & 0 & 0 & 0 \\ 0 & 5 & 0 & 0 \\ 0 & -1 & 1 & 0 \\ 0 & 0 & 1 & 4 \end{vmatrix} .$$

Now, take out 5 from the second row so that 1*(*2*)* + *(*3*)* ⇒, the result being the following:

$$|A| = (2)(5)\begin{vmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 4 \end{vmatrix}$$

The diagonal element in the third row is 1 and there is nothing to be taken out. Now −1*(*3*)* + *(*4*)* ⇒ and then, after having taken out 4 from the fourth row, the result is

$$|A| = (2)(5)(1)(4)\begin{vmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{vmatrix} \cdot \mathbf{1}$$

Now, by Axiom (4) the determinant of the remaining identity matrix is 1. Therefore, the final solution is |*A*| = *(*2*)(*5*)(*1*)(*4*)* = 40.

**Example 1.2.2.** Evaluate the determinant of the following matrix:

$$A = \begin{bmatrix} 1 & 2 & 4 & 1 \\ 0 & 3 & 2 & 1 \\ 2 & 1 & -1 & 0 \\ 5 & 2 & 1 & 3 \end{bmatrix}.$$

**Solution 1.2.2.** Since the first row, first column element is a convenient number 1 we start operating with the first row. Otherwise, we bring a convenient number to the *(*1*,* 1*)*-th position by interchanges of rows and columns (with each interchange the determinant is to be multiplied by *(*−1*)*. Our aim will be to reduce the matrix to a triangular form so that the determinant is the product of the diagonal elements. By using the first row let us wipe out the elements in the first column. The operations are −2*(*1*)* + *(*3*),* −5*(*1*)* + *(*4*)* ⇒. Then

$$|A| = \begin{vmatrix} 1 & 2 & 4 & 1 \\ 0 & 3 & 2 & 1 \\ 2 & 1 & -1 & 0 \\ 5 & 2 & 1 & 3 \end{vmatrix} = \begin{vmatrix} 1 & 2 & 4 & 1 \\ 0 & 3 & 2 & 1 \\ 0 & -3 & -9 & -2 \\ 0 & -8 & -19 & -2 \end{vmatrix}.$$

Now, by using the second row we want to wipe out the elements below the diagonal in the second column. But the first number is 3. One element in the third row can be wiped out by simply adding 1*(*2*)* + *(*3*)* ⇒. This brings the following:

$$|A| = \begin{vmatrix} 1 & 2 & 4 & 1 \\ 0 & 3 & 2 & 1 \\ 0 & 0 & -7 & -1 \\ 0 & -8 & -19 & -2 \end{vmatrix}.$$

If we take out 3 from the second row then it will bring in fractions. We will avoid fractions by multiplying the second row by 8 and the fourth row by 3. In order preserve the value, we keep <sup>1</sup> *(*8*)(*3*)* outside. Then, we add the second row to the fourth row or *(*2*)*+*(*4*)* ⇒. The result of these operations is the following:

$$|A| = \frac{1}{(8)(3)} \begin{vmatrix} 1 & 2 & 4 & 1 \\ 0 & 24 & 16 & 8 \\ 0 & 0 & -7 & -1 \\ 0 & -24 & -57 & -6 \end{vmatrix} = \frac{1}{(8)(3)} \begin{vmatrix} 1 & 2 & 4 & 1 \\ 0 & 24 & 16 & 8 \\ 0 & 0 & -7 & -1 \\ 0 & 0 & -41 & 2 \end{vmatrix}.$$

Now, multiply the third row by 41 and fourth row by 7 and then add−1*(*3*)* + *(*4*)* ⇒. The result is the following:

$$|A| = \frac{1}{(8)(3)(7)(41)} \begin{vmatrix} 1 & 2 & 4 & 1 \\ 0 & 24 & 16 & 8 \\ 0 & 0 & -287 & -41 \\ 0 & 0 & 0 & 55 \end{vmatrix}.$$

Now, take the product of the diagonal elements. Then

$$|A| = \frac{(1)(24)(-287)(55)}{(8)(3)(7)(41)} = -55.5$$

Observe that we did not have to repeat the 4 × 4 determinant each time. After wiping out the first column elements, we could have expressed the determinant as follows because only the elements in the second row and second column onward would then have mattered. That is,

$$|A| = (1)\begin{vmatrix} 3 & 2 & 1 \\ 0 & -7 & -1 \\ -8 & -19 & -2 \end{vmatrix}.$$

Similarly, after wiping out the second column elements, we could have written the resulting determinant as

$$|A| = \frac{(1)(24)}{(8)(3)} \begin{vmatrix} -7 & -1 \\ -41 & 2 \end{vmatrix},$$

and so on.

**Example 1.2.3.** Evaluate the determinant of a 2 × 2 general matrix.

**Solution 1.2.3.** A general 2 × 2 determinant can be opened up by using Axiom (3), that is,

$$\begin{aligned} |A| &= \begin{vmatrix} a\_{11} & a\_{12} \\ a\_{21} & a\_{22} \end{vmatrix} = \begin{vmatrix} a\_{11} & 0 \\ a\_{21} & a\_{22} \end{vmatrix} + \begin{vmatrix} 0 & a\_{12} \\ a\_{21} & a\_{22} \end{vmatrix} & \text{[Axiom (3)]}\\ &= a\_{11} \begin{vmatrix} 1 & 0 \\ a\_{21} & a\_{22} \end{vmatrix} + a\_{12} \begin{vmatrix} 0 & 1 \\ a\_{21} & a\_{22} \end{vmatrix} & \text{[Axiom (1)]}. \end{aligned}$$

If any of *a*<sup>11</sup> or *a*<sup>12</sup> is zero, then the corresponding determinant is zero. In the second determinant on the right, interchange the second and first columns, which will bring a minus sign outside the determinant. That is,

$$|A| = a\_{11} \left| \begin{matrix} 1 & 0 \\ a\_{21} & a\_{22} \end{matrix} \right| - a\_{12} \left| \begin{matrix} 1 & 0 \\ a\_{22} & a\_{21} \end{matrix} \right| = a\_{11}a\_{22} - a\_{12}a\_{21}.$$

The last step is done by using the property that the determinant of a triangular matrix is the product of the diagonal elements. We can also evaluate the determinant by using a number of different procedures. Taking out *a*<sup>11</sup> if *a*<sup>11</sup> = 0,

$$|A| = a\_{11} \left| \begin{matrix} 1 & \frac{a\_{12}}{a\_{11}} \\ a\_{21} & a\_{22} \end{matrix} \right|.$$

Now, perform the operation −*a*21*(*1*)* + *(*2*)* or −*a*<sup>21</sup> times the first row is added to the second row. Then,

$$|A| = a\_{11} \begin{vmatrix} 1 & \frac{a\_{12}}{a\_{11}} \\ 0 & a\_{22} - \frac{a\_{12}a\_{21}}{a\_{11}} \end{vmatrix} \cdot$$

Now, expanding by using a property of triangular matrices, we have

$$|A| = a\_{11}(1) \left[ a\_{22} - \frac{a\_{12}a\_{21}}{a\_{11}} \right] = a\_{11}a\_{22} - a\_{12}a\_{21}.\tag{1.2.4}$$

Consider a general 3 × 3 determinant evaluated by using Axiom (3) first.

$$\begin{aligned} |A| &= \begin{vmatrix} a\_{11} & a\_{12} & a\_{13} \\ a\_{21} & a\_{22} & a\_{23} \\ a\_{31} & a\_{32} & a\_{33} \end{vmatrix} \\ &= a\_{11} \begin{vmatrix} 1 & 0 & 0 \\ a\_{21} & a\_{22} & a\_{23} \end{vmatrix} + a\_{12} \begin{vmatrix} 0 & 1 & 0 \\ a\_{21} & a\_{22} & a\_{23} \end{vmatrix} + a\_{13} \begin{vmatrix} 0 & 0 & 1 \\ a\_{21} & a\_{22} & a\_{23} \end{vmatrix} \\ &= a\_{11} \begin{vmatrix} 1 & 0 & 0 \\ 0 & a\_{22} & a\_{23} \end{vmatrix} + a\_{12} \begin{vmatrix} 0 & 1 & 0 \\ 0 & 1 & 0 \\ a\_{31} & a\_{32} & a\_{33} \end{vmatrix} + a\_{13} \begin{vmatrix} 0 & 0 & 1 \\ 0 & 0 & 1 \\ a\_{21} & a\_{22} & 0 \\ a\_{31} & a\_{32} & 0 \end{vmatrix} \\ \end{aligned}$$

The first step consists in opening up the first row by making use of Axiom (3). Then, eliminate the elements in rows 2 and 3 within the column headed by 1. The next step is to bring the columns whose first element is 1 to the first column position by transpositions. The first matrix on the right-hand side is already in this format. One transposition is needed in the second matrix and two are required in the third matrix. After completing the transpositions, the next step consists in opening up each determinant along their second row and observing that the resulting matrices are lower triangular or can be made so after transposing their last two columns. The final result is then obtained. The last two steps are executed below:

$$|A| = a\_{11} \begin{vmatrix} 1 & 0 & 0 \\ 0 & a\_{22} & a\_{23} \\ 0 & a\_{32} & a\_{33} \end{vmatrix} - a\_{12} \begin{vmatrix} 1 & 0 & 0 \\ 0 & a\_{21} & a\_{23} \\ 0 & a\_{31} & a\_{33} \end{vmatrix} + a\_{13} \begin{vmatrix} 1 & 0 & 0 \\ 0 & a\_{21} & a\_{22} \\ 0 & a\_{31} & a\_{32} \end{vmatrix}$$

$$= a\_{11} [a\_{22}a\_{33} - a\_{23}a\_{32}] - a\_{12}[a\_{21}a\_{33} - a\_{23}a\_{31}] + a\_{13}[a\_{21}a\_{32} - a\_{22}a\_{31}]$$

$$= a\_{11}a\_{22}a\_{33} + a\_{12}a\_{23}a\_{31} + a\_{13}a\_{21}a\_{32} - a\_{11}a\_{23}a\_{32} - a\_{12}a\_{21}a\_{33} - a\_{13}a\_{22}a\_{31} \tag{1.2.5}$$

A few observations are in order. Once 1 is brought to the first row first column position in every matrix and the remaining elements in this first column are eliminated, one can delete the first row and first column and take the determinant of the remaining submatrix because only those elements will enter into the remaining operations involving opening up the second and successive rows by making use of Axiom (3). Hence, we could have written

$$|A| = a\_{11} \begin{vmatrix} a\_{22} & a\_{23} \\ a\_{32} & a\_{33} \end{vmatrix} - a\_{12} \begin{vmatrix} a\_{21} & a\_{23} \\ a\_{31} & a\_{33} \end{vmatrix} + a\_{13} \begin{vmatrix} a\_{21} & a\_{22} \\ a\_{31} & a\_{32} \end{vmatrix}.$$

This step is also called *the cofactor expansion of the matrix*. In a general matrix *A* = *(aij )*, *the cofactor of the element aij is equal to (*−1*)i*+*jMij where Mij is the minor of aij* . *This minor is obtained by deleting the i*-*th row and j* -*the column and then taking the determinant of the remaining elements*. The second item to be noted from (1.2.5) is that, in the final expression for |*A*|, each term has one and only one element from each row and each column of *A*. Some elements have plus signs in front of them and others have minus signs. For each term, write the first subscript in the natural order 1*,* 2*,* 3 for the 3 × 3 case and in the general *n* × *n* case, write the first subscripts in the natural order 1*,* 2*,...,n*. Now, examine the second subscripts. Let the number of transpositions needed to bring the second subscripts into the natural order 1*,* 2*,...,n* be *ρ*. Then, that term is multiplied by *(*−1*)ρ* so that an even number of transpositions produces a plus sign and an odd number of transpositions brings a minus sign, or equivalently if *ρ* is even, the coefficient is plus 1 and if *ρ* is odd, the coefficient is −1. This also enables us to open up a general determinant. This will be considered after pointing out one more property for a 3 × 3 case. The final representation in the 3 × 3 case in (1.2.5) can also be written up by using the following mechanical procedure. Write all elements in the matrix *A* in the natural order. Then, augment this arrangement with the first two columns. This yields the following format:

$$\begin{array}{cccccccc} a\_{11} & a\_{12} & a\_{13} & a\_{11} & a\_{12} \\ a\_{21} & a\_{22} & a\_{23} & a\_{21} & a\_{22} \\ a\_{31} & a\_{32} & a\_{33} & a\_{31} & a\_{32} \\ \end{array}$$

Now take the products of the elements along the diagonals going from the top left to the bottom right. These are the elements with the plus sign. Take the products of the elements in the second diagonals or the diagonals going from the bottom left to the top right. These are the elements with minus sign. As a result, |*A*| is as follows:

$$\begin{aligned} |A| &= [a\_{11}a\_{22}a\_{33} + a\_{12}a\_{23}a\_{31} + a\_{13}a\_{21}a\_{32}] \\ &- [a\_{13}a\_{22}a\_{31} + a\_{11}a\_{23}a\_{32} + a\_{12}a\_{21}a\_{33}].\end{aligned}$$

This mechanical procedure applies only in the 3 × 3 case. The general expansion is the following:

$$|A| = \begin{vmatrix} a\_{11} & a\_{12} & \dots & a\_{1n} \\ a\_{21} & a\_{22} & \dots & a\_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a\_{n1} & a\_{n2} & \dots & a\_{nn} \end{vmatrix} = \sum\_{i\_1} \cdots \sum\_{i\_n} (-1)^{\rho(i\_1, \dots, i\_n)} a\_{1i\_1} a\_{2i\_2} \cdots a\_{n i\_n} \tag{1.2.6}$$

where *ρ(i*1*,...,in)* is the number of transpositions needed to bring the second subscripts *i*1*,...,in* into the natural order 1*,* 2*,...,n*.

The cofactor expansion of a general matrix is obtained as follows: Suppose that we open up a *n* × *n* determinant *A* along the *i*-th row using Axiom (3). Then, after taking out *ai*1*, ai*2*,...,ain*, we obtain *n* determinants where, in the first determinant, 1 occupies the *(i,* 1*)*-th position, in the second one, 1 is at the *(i,* 2*)*-th position and so on so that, in the *j* th determinant, 1 occupies the *(i, j )*-th position. Given that *i*-th row, we can now eliminate all the elements in the columns corresponding to the remaining 1. We now bring this 1 into the first row first column position by transpositions in each determinant. The number of transpositions needed to bring this 1 from the *j* -th position in the *i*-th row to the first position in the *i*-th row, is *j* −1. Then, to bring that 1 to the first row first column position, another *i* − 1 transpositions are required, so that the total number of transpositions needed is *(i* <sup>−</sup>1*)*+*(j* <sup>−</sup>1*)* <sup>=</sup> *<sup>i</sup>* <sup>+</sup>*<sup>j</sup>* <sup>−</sup>2. Hence, the multiplicative factor is *(*−1*)i*+*j*−<sup>2</sup> <sup>=</sup> *(*−1*)i*+*<sup>j</sup>* , and the expansion is as follows:

$$\begin{split} |A| &= (-1)^{i+1} a\_{i1} M\_{i1} + (-1)^{i+2} a\_{i2} M\_{i2} + \cdots + (-1)^{i+n} a\_{in} M\_{in} \\ &= a\_{i1} C\_{i1} + a\_{i2} C\_{i2} + \cdots + a\_{in} C\_{in} \end{split} \tag{1.2.7}$$

where *Cij* <sup>=</sup> *(*−1*)i*+*jMij* , *Cij* is *the cofactor of aij and Mij is the minor of aij* , the minor being obtained by taking the determinant of the remaining elements after deleting the *i*-th row and *j* -th column of *A*. Moreover, if we expand along a certain row (column) and the cofactors of some other row (column), then the result will be zero. That is,

$$0 = a\_{i1}C\_{j1} + a\_{i2}C\_{j2} + \cdots + a\_{in}C\_{jn}, \text{ for all } i \neq j. \tag{1.2.8}$$

*Inverse of a Matrix* Regular inverses exist only for square matrices that are nonsingular. The standard notation for a regular inverse of a matrix *<sup>A</sup>* is *<sup>A</sup>*−1. It is defined as *AA*−<sup>1</sup> <sup>=</sup> *In* and *<sup>A</sup>*−1*<sup>A</sup>* <sup>=</sup> *In*. The following properties can be deduced from the definition. First, we note that *AA*−<sup>1</sup> <sup>=</sup> *<sup>A</sup>*<sup>0</sup> <sup>=</sup> *<sup>I</sup>* <sup>=</sup> *<sup>A</sup>*−1*A*. When *<sup>A</sup>* and *<sup>B</sup>* are *<sup>n</sup>* <sup>×</sup> *<sup>n</sup>* nonsingular matrices, then *(AB)*−<sup>1</sup> <sup>=</sup> *<sup>B</sup>*−1*A*−1, which can be established by pre- or post-multiplying the righthand side side by *AB*. Accordingly, with *<sup>A</sup><sup>m</sup>* <sup>=</sup> *<sup>A</sup>* <sup>×</sup> *<sup>A</sup>* ×···× *<sup>A</sup>*, *<sup>A</sup>*−*<sup>m</sup>* <sup>=</sup> *<sup>A</sup>*−<sup>1</sup> ×···× *<sup>A</sup>*−<sup>1</sup> <sup>=</sup> *(Am)*−1, *<sup>m</sup>* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,...* , and when *<sup>A</sup>*1*,...,Ak* are *<sup>n</sup>* <sup>×</sup> *<sup>n</sup>* nonsingular matrices, *(A*1*A*<sup>2</sup> ··· *Ak)*−<sup>1</sup> <sup>=</sup> *<sup>A</sup>*−<sup>1</sup> *<sup>k</sup> <sup>A</sup>*−<sup>1</sup> *<sup>k</sup>*−<sup>1</sup> ··· *<sup>A</sup>*−<sup>1</sup> <sup>2</sup> *<sup>A</sup>*−<sup>1</sup> <sup>1</sup> . We can also obtain a formula for the inverse of a nonsingular matrix *A* in terms of cofactors. Assuming that *A*−<sup>1</sup> exist and letting Cof*(A)* = *(Cij )* be the matrix of cofactors of *A*, that is, if *A* = *(aij )* and if *Cij* is the cofactor of *aij* then Cof*(A)* = *(Cij )*. It follows from (1.2.7) and (1.2.8) that

$$A^{-1} = \frac{1}{|A|} (\text{Cof}(A))' = \frac{1}{|A|} \begin{bmatrix} C\_{11} & \dots & C\_{1n} \\ \vdots & \ddots & \vdots \\ C\_{n1} & \dots & C\_{nn} \end{bmatrix}',\tag{1.2.9}$$

that is, the transpose of the cofactor matrix divided by the determinant of *A*. What about *A* 1 <sup>2</sup> ? For a scalar quantity *a*, we have the definition that if *b* exists such that *b* × *b* = *a*, then *b* is a square root of *a*. Consider the following 2 × 2 matrices:

$$\begin{aligned} I\_2 &= \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, B\_1 = \begin{bmatrix} -1 & 0 \\ 0 & 1 \end{bmatrix}, \ B\_2 = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}, \\\\ B\_3 &= \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix}, \ B\_4 = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}, \ I\_2^2 = I\_2, \ B\_j^2 = I\_2, \ j = 1, 2, 3, 4. \end{aligned}$$

Thus, if we use the definition *<sup>B</sup>*<sup>2</sup> <sup>=</sup> *<sup>A</sup>* and claim that *<sup>B</sup>* is the square root of *<sup>A</sup>*, there are several candidates for *B*; this means that, in general, the square root of a matrix cannot be uniquely determined. However, if we restrict ourselves to the class of positive definite matrices, then a square root can be uniquely defined. The definiteness of matrices will be considered later.

#### **1.2.1. Inverses by row operations or elementary operations**

Basic elementary matrices are of two types. Let us call them the *E*-type and the *F*type. An elementary matrix of the *E*-type is obtained by taking an identity matrix and multiplying any row (column) by a nonzero scalar. For example,

$$I\_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \ E\_1 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -2 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \ E\_2 = \begin{bmatrix} 5 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \ E\_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & -1 \end{bmatrix}, \ E\_4 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & -1 \end{bmatrix}$$

where *E*1*, E*2*, E*<sup>3</sup> are elementary matrices of the *E*-type obtained from the identity matrix *I*3. If we pre-multiply an arbitrary matrix *A* with an elementary matrix of the *E*-type, then the same effect will be observed on the rows of the arbitrary matrix *A*. For example, consider a 3 × 3 matrix *A* = *(aij )*. Then, for example,

$$E\_1 A = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -2 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} a\_{11} & a\_{12} & a\_{13} \\ a\_{21} & a\_{22} & a\_{23} \\ a\_{31} & a\_{32} & a\_{33} \end{bmatrix} = \begin{bmatrix} a\_{11} & a\_{12} & a\_{13} \\ -2a\_{21} & -2a\_{22} & -2a\_{23} \\ a\_{31} & a\_{32} & a\_{33} \end{bmatrix}.$$

Thus, the same effect applies to the rows, that is, the second row is multiplied by *(*−2*)*. Observe that *E*-type elementary matrices are always nonsingular and so, their regular inverses exist. For instance,

$$E\_1^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -\frac{1}{2} & 0 \\ 0 & 0 & 1 \end{bmatrix},\ E\_1 E\_1^{-1} = I\_3 = E\_1^{-1} E\_1; \ E\_2^{-1} = \begin{bmatrix} \frac{1}{5} & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix},\ E\_2 E\_2^{-1} = I\_3.$$

Observe that post-multiplication of an arbitrary matrix by an *E*-type elementary matrix will have the same effect on the columns of the arbitrary matrix. For example, *AE*<sup>1</sup> will have the same effect on the columns of *A*, that is, the second column of *A* is multiplied by −2; *AE*<sup>2</sup> will result in the first column of *A* being multiplied by 5, and so on. The *F*-type elementary matrix is created by adding any particular row of an identity matrix to another one of its rows. For example, consider a 3 × 3 identity matrix *I*<sup>3</sup> and let

$$F\_1 = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix},\ F\_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix},\ F\_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 1 \end{bmatrix},$$

where *F*<sup>1</sup> is obtained by adding the first row to the second row of *I*3; *F*<sup>2</sup> is obtained by adding the first row to the third row of *I*3; and *F*<sup>3</sup> is obtained by adding the second row of *I*<sup>3</sup> to the third row. As well, *F*-type elementary matrices are nonsingular, and for instance,

$$\begin{aligned} \,^1F\_1^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ -1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \,\,^1F\_2^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -1 & 0 & 1 \end{bmatrix}, \,\,^1R\_3^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -1 & 1 \end{bmatrix}, \,\,^1F\_3^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -1 & 1 \end{bmatrix} \end{aligned}$$

where *F*1*F* <sup>−</sup><sup>1</sup> <sup>1</sup> <sup>=</sup> *<sup>I</sup>*3*, F* <sup>−</sup><sup>1</sup> <sup>2</sup> *<sup>F</sup>*<sup>2</sup> <sup>=</sup> *<sup>I</sup>*<sup>3</sup> and *<sup>F</sup>* <sup>−</sup><sup>1</sup> <sup>3</sup> *F*<sup>3</sup> = *I*3. If we pre-multiply an arbitrary matrix *A* by an *F*-type elementary matrix, then the same effect will be observed on the rows of *A*. For example,

$$F\_1 A = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} a\_{11} & a\_{12} & a\_{13} \\ a\_{21} & a\_{22} & a\_{23} \\ a\_{31} & a\_{32} & a\_{33} \end{bmatrix} = \begin{bmatrix} a\_{11} & a\_{12} & a\_{13} \\ a\_{21} + a\_{11} & a\_{22} + a\_{12} & a\_{23} + a\_{13} \\ a\_{31} & a\_{32} & a\_{33} \end{bmatrix}.$$

Thus, the same effect applies to the rows, namely, the first row is added to the second row in *A* (as *F*<sup>1</sup> was obtained by adding the first row of *I*<sup>3</sup> to the second row of *I*3). The reader may verify that *F*2*A* has the effect of the first row being added to the third row and *F*3*A* will have the effect of the second row being added to the third row. By combining *E*and *F*-type elementary matrices, we end up with a *G*-type matrix wherein a multiple of any particular row of an identity matrix is added to another one of its rows. For example, letting

$$G\_1 = \begin{bmatrix} 1 & 0 & 0 \\ \mathbf{5} & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \text{ and } G\_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{bmatrix},$$

it is seen that *G*<sup>1</sup> is obtained by adding 5 times the first row to the second row in *I*3, and *G*<sup>2</sup> is obtained by adding −2 times the first row to the third row in *I*3. Pre-multiplication of an arbitrary matrix *A* by *G*1, that is, *G*1*A*, will have the effect that 5 times the first row of *A* will be added to its second row. Similarly, *G*<sup>2</sup> will have the effect that −2 times the first row of *A* will be added to its third row. Being product of *E*- and *F*-type elementary matrices, *G*-type matrices are also nonsingular. We also have the result that if *A, B, C* are *n* × *n* matrices and *B* = *C*, then *AB* = *AC* as long as *A* is nonsingular. In general, if *A*1*,...,Ak* are *n* × *n* nonsingular matrices, we have

$$\begin{aligned} B &= C \Rightarrow A\_k A\_{k-1} \dotsm A\_2 A\_1 B = A\_k A\_{k-1} \dotsm A\_2 A\_1 C; \\ \Rightarrow A\_1 A\_2 B &= A\_1 (A\_2 B) = (A\_1 A\_2) B = (A\_1 A\_2) C = A\_1 (A\_2 C). \end{aligned}$$

We will evaluate the inverse of a nonsingular square matrix by making use of elementary matrices. The procedure will also verify whether a regular inverse exists for a given matrix. If a regular inverse for a square matrix *<sup>A</sup>* exists, then *AA*−<sup>1</sup> <sup>=</sup> *<sup>I</sup>* . We can pre- or postmultiply *A* by elementary matrices. For example,

$$AA^{-1} = I \Rightarrow E\_k F\_r \cdots E\_1 F\_1 A A^{-1} = E\_k F\_r \cdots E\_1 F\_1 I$$

$$\Rightarrow (E\_k \cdots F\_1 A) A^{-1} = (E\_k \cdots F\_1).$$

Thus, if the operations *Ek* ··· *<sup>F</sup>*<sup>1</sup> on *<sup>A</sup>* reduced *<sup>A</sup>* to an identity matrix, then *<sup>A</sup>*−<sup>1</sup> is *Ek* ··· *F*1. If an inconsistency has occurred during the process, we can conclude that there is no inverse for *A*. Hence, our aim in performing our elementary operations on the left of *A* is to reduce it to an identity matrix, in which case the product of the elementary matrices on the right-hand side of the last equation will produce the inverse of *A*.

**Example 1.2.4.** Evaluate *A*−<sup>1</sup> if it exists, where

$$A = \begin{bmatrix} 1 & 1 & 1 & 1 \\ -1 & 0 & 1 & 0 \\ 2 & 1 & 1 & 2 \\ 1 & 1 & -1 & 1 \end{bmatrix}.$$

**Solution 1.2.4.** If *<sup>A</sup>*−<sup>1</sup> exists then *AA*−<sup>1</sup> <sup>=</sup> *<sup>I</sup>* which means

$$
\begin{bmatrix} 1 & 1 & 1 & 1 \\ -1 & 0 & 1 & 0 \\ 2 & 1 & 1 & 2 \\ 1 & 1 & -1 & 1 \end{bmatrix} A^{-1} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}.
$$

This is our starting equation. Only the configuration of the elements matters. The matrix notations and the symbol *A*−<sup>1</sup> can be disregarded. Hence, we consider only the configuration of the numbers of the matrix *A* on the left and the numbers in the identity matrix on the right. Then we pre-multiply *A* and pre-multiply the identity matrix by only making use of elementary matrices. In the first set of steps, our aim consists in reducing every element in the first column of *A* to zeros, except the first one, by only using the first row. For each elementary operation on *A*, the same elementary operation is done on the identity matrix also. Now, utilizing the second row of the resulting *A*, we reduce all the elements in the second column of *A* to zeros except the second one and continue in this manner until all the elements in the last columns except the last one are reduced to zeros by making use of the last row, thus reducing *A* to an identity matrix, provided of course that *A* is nonsingular. In our example, the elements in the first column can be made equal to zeros by applying the following operations. We will employ the following standard notation: *a(i)* + *(j )* ⇒ meaning *a* times the *i*-th row is added to the *j* -th row, giving the result. Consider *(*1*)* + *(*2*)*; −2*(*1*)* + *(*3*)*; −1*(*1*)* + *(*4*)* ⇒ (that is, the first row is added to the second row; and then −2 times the first row is added to the third row; then −1 times the first row is added to the fourth row), (for each elementary operation on *A* we do the same operation on the identity matrix also) the net result being


Now, start with the second row of the resulting *A* and the resulting identity matrix and try to eliminate all the other elements in the second column of the resulting *A*. This can be achieved by performing the following operations: *(*2*)* + *(*3*)*; −1*(*2*)* + *(*1*)* ⇒


Now, start with the third row and eliminate all other elements in the third column. This can be achieved by the following operations. Writing the row used in the operations (the third one in this case) within the first set of parentheses for each operation, we have 2*(*3*)*+ *(*4*)*; −2*(*3*)* + *(*2*)*; *(*3*)* + *(*1*)* ⇒


Divide the 4th row by 2 and then perform the following operations: <sup>1</sup> <sup>2</sup> *(*4*)*; −1*(*4*)* + *(*3*)*; *(*4*)* + *(*2*)*; −1*(*4*)* + *(*1*)* ⇒


Thus,

$$A^{-1} = \begin{bmatrix} \frac{1}{2} & -1 & 0 & -\frac{1}{2} \\ \frac{3}{2} & 0 & -1 & \frac{1}{2} \\ \frac{1}{2} & 0 & 0 & -\frac{1}{2} \\ -\frac{3}{2} & 1 & 1 & \frac{1}{2} \end{bmatrix}.$$

This result should be verified to ensure that it is free of computational errors. Since

$$AA^{-1} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ -1 & 0 & 1 & 0 \\ 2 & 1 & 1 & 2 \\ 1 & 1 & -1 & 1 \end{bmatrix} \begin{bmatrix} \frac{1}{2} & -1 & 0 & -\frac{1}{2} \\ \frac{3}{2} & 0 & -1 & \frac{1}{2} \\ \frac{1}{2} & 0 & 0 & -\frac{1}{2} \\ -\frac{3}{2} & 1 & 1 & \frac{1}{2} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix},$$

the result is indeed correct.

**Example 1.2.5.** Evaluate *A*−<sup>1</sup> if it exists where

$$A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & -1 & 1 \\ 2 & 0 & 2 \end{bmatrix}.$$

**Solution 1.2.5.** If *<sup>A</sup>*−<sup>1</sup> exists, then *AA*−<sup>1</sup> <sup>=</sup> *<sup>I</sup>*3. Write

$$
\begin{bmatrix} 1 & 1 & 1 \\ 1 & -1 & 1 \\ 2 & 0 & 2 \end{bmatrix} A^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}.
$$

Starting with the first row, eliminate all other elements in the first column with the following operations: −1*(*1*)* + *(*2*)*; −2*(*1*)* + *(*3*)* ⇒

$$
\begin{array}{ccccc}
1 & 1 & 1 & & 1 & 0 & 0 \\
0 & -2 & 0 & \Leftrightarrow & -1 & 1 & 0 \\
0 & -2 & 0 & & -2 & 0 & 1
\end{array}
$$

The second and third rows on the left side being identical, the left-hand side matrix is singular, which means that *A* is singular. Thus, the inverse of *A* does not exist in this case.

#### **1.3. Determinants of Partitioned Matrices**

Consider a matrix *A* written in the following format:

$$A = \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix} \text{ where } A\_{11}, A\_{12}, A\_{21}, A\_{22} \text{ are submatrices.} $$

For example,

$$A = \begin{bmatrix} a\_{11} & a\_{12} & a\_{13} \\ a\_{21} & a\_{22} & a\_{23} \\ a\_{31} & a\_{32} & a\_{33} \end{bmatrix} = \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix}, \ A\_{11} = [a\_{11}], \ A\_{12} = [a\_{12} \ a\_{13}], \ A\_{21} = \begin{bmatrix} a\_{21} \\ a\_{31} \end{bmatrix}$$

and *A*<sup>22</sup> = *a*<sup>22</sup> *a*<sup>23</sup> *<sup>a</sup>*<sup>32</sup> *<sup>a</sup>*<sup>33</sup> . The above is a 2 × 2 partitioning or a partitioning into two submatrices by two sub-matrices. But a 2×2 partitioning is not unique. We may also consider

$$A\_{11} = \begin{bmatrix} a\_{11} & a\_{12} \\ a\_{21} & a\_{22} \end{bmatrix}, \ A\_{22} = [a\_{33}], \ A\_{12} = \begin{bmatrix} a\_{13} \\ a\_{23} \end{bmatrix}, \ A\_{21} = [a\_{31} \ a\_{32}],$$

which is another 2×2 partitioning of *A*. We can also have a 1×2 or 2×1 partitioning into sub-matrices. We may observe one interesting property. Consider a block diagonal matrix. Let

$$A = \begin{bmatrix} A\_{11} & O \\ O & A\_{22} \end{bmatrix} \Rightarrow \ |A| = \begin{vmatrix} A\_{11} & O \\ O & A\_{22} \end{vmatrix},$$

where *A*<sup>11</sup> is *r* × *r*, *A*<sup>22</sup> is *s* × *s*, *r* + *s* = *n* and *O* indicates a null matrix. Observe that when we evaluate the determinant, all the operations on the first *r* rows will produce the determinant of *A*<sup>11</sup> as a coefficient, without affecting *A*22, leaving an *r* × *r* identity matrix in the place of *A*11. Similarly, all the operations on the last *s* rows will produce the determinant of *A*<sup>22</sup> as a coefficient, leaving an *s* × *s* identity matrix in place of *A*22. In other words, for a diagonal block matrix whose diagonal blocks are *A*<sup>11</sup> and *A*22,

$$|A| = |A\_{11}| \times |A\_{22}|.\tag{1.3.1}$$

Given a triangular block matrix, be it lower or upper triangular, then its determinant is also the product of the determinants of the diagonal blocks. For example, consider

$$A = \begin{bmatrix} A\_{11} & A\_{12} \\ O & A\_{22} \end{bmatrix} \Rightarrow \ |A| = \begin{vmatrix} A\_{11} & A\_{12} \\ O & A\_{22} \end{vmatrix} \cdot$$

By using *A*22, we can eliminate *A*<sup>12</sup> without affecting *A*<sup>11</sup> and hence, we can reduce the matrix of the determinant to a diagonal block form without affecting the value of the determinant. Accordingly, the determinant of an upper or lower triangular block matrix whose diagonal blocks are *A*<sup>11</sup> and *A*22, is

$$|A| = |A\_{11}| \, |A\_{22}|. \tag{1.3.2}$$

Partitioning is done to accommodate further operations such as matrix multiplication. Let *A* and *B* be two matrices whose product *AB* is defined. Suppose that we consider a 2 × 2 partitioning of *A* and *B* into sub-matrices; if the multiplication is performed treating the sub-matrices as if they were scalar quantities, the following format is obtained:

$$\begin{aligned} AB &= \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix} \begin{bmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{bmatrix} \\ &= \begin{bmatrix} A\_{11}B\_{11} + A\_{12}B\_{21} & A\_{11}B\_{12} + A\_{12}B\_{22} \\ A\_{21}B\_{11} + A\_{22}B\_{21} & A\_{21}B\_{12} + A\_{22}B\_{22} \end{bmatrix} .\end{aligned}$$

If all the products of sub-matrices on the right-hand side are defined, then we say that *A and B are conformably partitioned for the product AB*. Let *A* be a *n* × *n* matrix whose determinant is defined. Let us consider the 2 × 2 partitioning

$$A = \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix} \text{ where } A\_{11} \text{ is } r \times r, \text{ } A\_{22} \text{ is } s \times s, \text{ } r+s=n.$$

Then, *A*<sup>12</sup> is *r* × *s* and *A*<sup>21</sup> is *s* × *r*. In this case, the first row block is [*A*<sup>11</sup> *A*12] and the second row block is [*A*<sup>21</sup> *A*22]. When evaluating a determinant, we can add linear functions of rows to any other row or linear functions of rows to other blocks of rows without affecting the value of the determinant. What sort of a linear function of the first row block could be added to the second row block so that a null matrix *O* appears in the position of *<sup>A</sup>*21? It is <sup>−</sup>*A*21*A*−<sup>1</sup> <sup>11</sup> times the first row block. Then, we have

$$|A| = \begin{vmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{vmatrix} = \begin{vmatrix} A\_{11} & A\_{12} \\ O & A\_{22} - A\_{21}A\_{11}^{-1}A\_{12} \end{vmatrix}.$$

This is a triangular block matrix and hence its determinant is the product of the determinants of the diagonal blocks. That is,

$$|A| = |A\_{11}| \, |A\_{22} - A\_{21} A\_{11}^{-1} A\_{12}|, \, |A\_{11}| \neq 0.1$$

From symmetry, it follows that

$$|A| = |A\_{22}| \, |A\_{11} - A\_{12} A\_{22}^{-1} A\_{21}|, \, |A\_{22}| \neq 0. \tag{1.3.3}$$

Let us now examine the inverses of partitioned matrices. Let *A* and *A*−<sup>1</sup> be conformably partitioned for the product *AA*−1. Consider a 2 <sup>×</sup> 2 partitioning of both *<sup>A</sup>* and *<sup>A</sup>*−1. Let

$$A = \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix} \text{ and } A^{-1} = \begin{bmatrix} A^{11} & A^{12} \\ A^{21} & A^{22} \end{bmatrix}$$

where *<sup>A</sup>*<sup>11</sup> and *<sup>A</sup>*<sup>11</sup> are *<sup>r</sup>* <sup>×</sup> *<sup>r</sup>* and *<sup>A</sup>*<sup>22</sup> and *<sup>A</sup>*<sup>22</sup> are *<sup>s</sup>* <sup>×</sup> *<sup>s</sup>* with *<sup>r</sup>* <sup>+</sup> *<sup>s</sup>* <sup>=</sup> *<sup>n</sup>*, *<sup>A</sup>* is *<sup>n</sup>* <sup>×</sup> *<sup>n</sup>* and nonsingular. *AA*−<sup>1</sup> <sup>=</sup> *<sup>I</sup>* gives the following:

$$
\begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix} \begin{bmatrix} A^{11} & A^{12} \\ A^{21} & A^{22} \end{bmatrix} = \begin{bmatrix} I\_r & O \\ O & I\_s \end{bmatrix} \cdot \mathbf{I}
$$

That is,

$$A\_{11}A^{11} + A\_{12}A^{21} = I\_r \tag{i}$$

$$A\_{11}A^{12} + A\_{12}A^{22} = O\tag{ii}$$

$$A\_{21}A^{11} + A\_{22}A^{21} = O\tag{iii}$$

$$A\_{21}A^{12} + A\_{22}A^{22} = I\_s. \tag{i\nu}$$

From *(ii)*, *<sup>A</sup>*<sup>12</sup> = −*A*−<sup>1</sup> <sup>11</sup> *<sup>A</sup>*12*A*22. Substituting in *(iv)*,

$$[A\_{21}[-A\_{11}^{-1}A\_{12}A^{22}] + A\_{22}A^{22} = I\_s \Rightarrow [A\_{22} - A\_{21}A\_{11}^{-1}A\_{12}]A^{22} = I\_s \dots]$$

That is,

$$A^{22} = (A\_{22} - A\_{21}A\_{11}^{-1}A\_{12})^{-1}, |A\_{11}| \neq 0,\tag{1.3.4}$$

and, from symmetry, it follows that

$$A^{\rm l1} = (A\_{11} - A\_{12}A\_{22}^{-1}A\_{21})^{-1}, |A\_{22}| \neq 0 \tag{1.3.5}$$

$$A\_{11} = (A^{11} - A^{12}(A^{22})^{-1}A^{21})^{-1}, |A^{22}| \neq 0 \tag{1.3.6}$$

$$A\_{22} = (A^{22} - A^{21}(A^{11})^{-1}A^{12})^{-1}, |A^{11}| \neq 0. \tag{1.3.7}$$

The rectangular components *A*12*, A*21*, A*12*, A*<sup>21</sup> can also be evaluated in terms of the submatrices by making use of Eqs. *(i)*–*(iv)*.

#### **1.4. Eigenvalues and Eigenvectors**

Let *A* be *n* × *n* matrix, *X* be an *n* × 1 vector, and *λ* be a scalar quantity. Consider the equation

$$AX = \lambda X \Rightarrow (A - \lambda I)X = O.$$

Observe that *X* = *O* is always a solution. If this equation has a non-null vector *X* as a solution, then the determinant of the coefficient matrix must be zero because this matrix must be singular. If the matrix *(A* <sup>−</sup> *λI )* were nonsingular, its inverse *(A* <sup>−</sup> *λI )*−<sup>1</sup> would exist and then, on pre-multiplying *(A*−*I )X* <sup>=</sup> *<sup>O</sup>* by *(A*−*λI )*−1, we would have *<sup>X</sup>* <sup>=</sup> *<sup>O</sup>*, which is inadmissible since *X* = *O*. That is,

$$|A - \lambda I| = 0, \text{ } \lambda \text{ being a scalar quantity.} \tag{1.4.1}$$

Since the matrix *A* is *n* × *n*, equation (1.4.1) has *n* roots, which will be denoted by *λ*1*,...,λn*. Then

$$|A - \lambda I| = (\lambda\_1 - \lambda)(\lambda\_2 - \lambda)\cdots(\lambda\_n - \lambda), \ A X\_j = \lambda\_j X\_j.$$

Then, *λ*1*,...,λn* are called *the eigenvalues of A and Xj* = *O, an eigenvector corresponding to the eigenvalue λj* .

**Example 1.4.1.** Compute the eigenvalues and eigenvectors of the matrix *A* = 1 1 1 2 .

**Solution 1.4.1.** Consider the equation

$$|A - \lambda I| = 0 \Rightarrow \left| \begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix} - \lambda \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \right| = 0 \Rightarrow$$

$$\left| \begin{vmatrix} 1 - \lambda & 1 \\ 1 & 2 - \lambda \end{vmatrix} \right| = 0 \Rightarrow (1 - \lambda)(2 - \lambda) - 1 = 0 \Rightarrow \lambda^2 - 3\lambda + 1 = 0 \Rightarrow$$

$$\lambda = \frac{3 \pm \sqrt{(9 - 4)}}{2} \Rightarrow \lambda\_1 = \frac{3}{2} + \frac{\sqrt{5}}{2}, \ \lambda\_2 = \frac{3}{2} - \frac{\sqrt{5}}{2}.$$

An eigenvector *<sup>X</sup>*<sup>1</sup> corresponding to *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>3</sup> 2 + √5 <sup>2</sup> is given by *AX*<sup>1</sup> = *λ*1*X*<sup>1</sup> or *(A* − *λ*1*I )X*<sup>1</sup> = *O*. That is,

$$
\begin{bmatrix}
1 - \left(\frac{3}{2} + \frac{\sqrt{5}}{2}\right) & 1 \\
1 & 2 - \left(\frac{3}{2} + \frac{\sqrt{5}}{2}\right)
\end{bmatrix}
\begin{bmatrix}
x\_1 \\ x\_2
\end{bmatrix} = \begin{bmatrix}
0 \\ 0
\end{bmatrix} \Rightarrow
$$

$$
\begin{aligned}
\left(-\frac{1}{2} - \frac{\sqrt{5}}{2}\right)x\_1 + x\_2 &= 0, \\
\left(1 - \frac{1}{\sqrt{5}}\right) &= \frac{1}{2}
\end{aligned}
\tag{i}
$$

$$\mathbf{x}\_1 + \left(\frac{1}{2} - \frac{\sqrt{5}}{2}\right)\mathbf{x}\_2 = \mathbf{0}.\tag{ii}$$

Since *A* − *λ*1*I* is singular, both *(i)* and *(ii)* must give the same solution. Letting *x*<sup>2</sup> = 1 in *(ii)*, *<sup>x</sup>*<sup>1</sup> = −<sup>1</sup> 2 + √5 <sup>2</sup> . Thus, one solution *X*<sup>1</sup> is

$$X\_1 = \begin{bmatrix} -\frac{1}{2} + \frac{\sqrt{5}}{2} \\ 1 \end{bmatrix}.$$

Any nonzero constant multiple of *X*<sup>1</sup> is also a solution to *(A* − *λ*1*I )X*<sup>1</sup> = *O*. An eigenvector *X*<sup>2</sup> corresponding to the eigenvalue *λ*<sup>2</sup> is given by *(A* − *λ*2*I )X*<sup>2</sup> = *O*. That is,

$$\left(-\frac{1}{2} + \frac{\sqrt{5}}{2}\right)\mathbf{x}\_1 + \mathbf{x}\_2 = \mathbf{0},\tag{iii}$$

$$\mathbf{x}\_1 + \left(\frac{1}{2} + \frac{\sqrt{5}}{2}\right) \mathbf{x}\_2 = \mathbf{0}.\tag{i\nu}$$

Hence, one solution is

$$X\_2 = \left[\begin{matrix} -\frac{1}{2} - \frac{\sqrt{5}}{2} \\ 1 \end{matrix} \right].$$

Any nonzero constant multiple of *X*<sup>2</sup> is also an eigenvector corresponding to *λ*2.

Even if all elements of a matrix *A* are real, its eigenvalues can be real, positive, negative, zero, irrational or complex. Complex and irrational roots appear in pairs. If *<sup>a</sup>* <sup>+</sup> *ib, i* <sup>=</sup> <sup>√</sup>*(*−1*),* and *a, b* real, is an eigenvalue, then *<sup>a</sup>* <sup>−</sup> *ib* is also an eigenvalue of the same matrix. The following properties can be deduced from the definition:

(1): *The eigenvalues of a diagonal matrix are its diagonal elements*;

(2): *The eigenvalues of a triangular (lower or upper) matrix are its diagonal elements*;

(3): *If any eigenvalue is zero, then the matrix is singular and its determinant is zero*;

(4): *If λ is an eigenvalue of A and if A is nonsingular, then* <sup>1</sup> *<sup>λ</sup> is an eigenvalue of <sup>A</sup>*−1;

(5): *If <sup>λ</sup> is an eigenvalue of <sup>A</sup>*, *then <sup>λ</sup><sup>k</sup> is an eigenvalue of <sup>A</sup>k*, *<sup>k</sup>* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,...*, *their associated eigenvector being the same*;

(7): *The eigenvalues of an identity matrix are unities; however, the converse need not be true*;

(8): *The eigenvalues of a scalar matrix with diagonal elements a, a,... , a are a repeated n times when the order of A is n*; *however, the converse need not be true*;

(9): *The eigenvalues of an orthonormal matrix*, *AA*- = *I, A*- *A* = *I* , *are* ±1; *however, the converse need not be true*;

(10): *The eigenvalues of an idempotent matrix*, *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*2, *are ones and zeros; however, the converse need not be true. The only nonsingular idempotent matrix is the identity matrix*;

(11): *At least one of the eigenvalues of a nilpotent matrix of order r*, *that is*, *A* = *O,... ,Ar*−<sup>1</sup> <sup>=</sup> *O,A<sup>r</sup>* <sup>=</sup> *O, is null*;

(12): *For an n* × *n matrix, both A and Ahave the same eigenvalues*;

(13): *The eigenvalues of a symmetric matrix are real*;

(14): *The eigenvalues of a skew symmetric matrix can only be zeros and purely imaginary numbers*;

(15): *The determinant of A is the product of its eigenvalues*: |*A*| = *λ*1*λ*<sup>2</sup> ··· *λn* ;

(16): *The trace of a square matrix is equal to the sum of its eigenvalues*;

(17): *If A* = *A*- *(symmetric), then the eigenvectors corresponding to distinct eigenvalues are orthogonal*;

(18): *If A* = *A and A is n* × *n*, *then there exists a full set of n eigenvectors which are linearly independent, even if some eigenvalues are repeated.*

Result (16) requires some explanation. We have already derived the following two results:

$$|A| = \sum\_{i\_1} \cdots \sum\_{i\_n} (-1)^{\rho} a\_{1i\_1} a\_{2i\_2} \cdots a\_{n i\_n} \tag{v}$$

and

$$|A - \lambda I| = (\lambda\_1 - \lambda)(\lambda\_2 - \lambda) \cdots (\lambda\_n - \lambda). \tag{\text{vì}}$$

Equation (*vi*) yields a polynomial of degree *n* in *λ* where *λ* is a variable. When *λ* = 0, we have <sup>|</sup>*A*| = *<sup>λ</sup>*1*λ*<sup>2</sup> ··· *λn*, the product of the eigenvalues. The term containing *(*−1*)nλn*, when writing |*A* − *λI* | in the format of equation (*v*), can only come from the term *(a*<sup>11</sup> − *λ)(a*<sup>22</sup> − *λ)*···*(ann* − *λ)* (refer to the explicit form for the 3 × 3 case discussed earlier). Two factors containing *λ* will be missing in the next term with the highest power of *λ*. Hence, *(*−1*)nλn* and *(*−1*)n*−1*λn*−<sup>1</sup> can only come from the term *(a*<sup>11</sup> <sup>−</sup>*λ)*···*(ann* <sup>−</sup>*λ)*, as can be seen from the expansion in the 3 × 3 case discussed in detail earlier. From *(v)* the coefficient of *(*−1*)n*−1*λn*−<sup>1</sup> is *<sup>a</sup>*<sup>11</sup> <sup>+</sup> *<sup>a</sup>*<sup>22</sup> +···+ *ann* <sup>=</sup> tr*(A)* and from (*vi*), the coefficient of *(*−1*)n*−1*λn*−<sup>1</sup> is *<sup>λ</sup>*<sup>1</sup> +···+ *λn*. Hence tr*(A)* <sup>=</sup> *<sup>λ</sup>*<sup>1</sup> +·+ *λn* <sup>=</sup> sum of the eigenvalues of *A*. This does not mean that *λ*<sup>1</sup> = *a*11*, λ*<sup>2</sup> = *a*22*,...,λn* = *ann*, only that the sums will be equal.

*Matrices in the Complex Domain* When the elements in *A* = *(aij )* can also be complex quantities, then a typical element in *<sup>A</sup>* will be of the form *<sup>a</sup>* <sup>+</sup> *ib, i* <sup>=</sup> <sup>√</sup>*(*−1*),* and *a, b* real. The complex conjugate of *A* will be denoted by *A*¯ and the conjugate transpose will be denoted by *A*∗. Then, for example,

$$A = \begin{bmatrix} 1+i & 2i & 3-i \\ 4i & 5 & 1+i \\ 2-i & i & 3+i \end{bmatrix} \Rightarrow \tilde{A} = \begin{bmatrix} 1-i & -2i & 3+i \\ -4i & 5 & 1-i \\ 2+i & -i & 3-i \end{bmatrix}, \\ A^\* = \begin{bmatrix} 1-i & -4i & 2+i \\ -2i & 5 & -i \\ 3+i & 1-i & 3-i \end{bmatrix}.$$

Thus, we can also write *A*<sup>∗</sup> = *(A)*¯ - = ¯ *(A*- *)*. When a matrix *A* is in the complex domain, we may write it as *A* = *A*<sup>1</sup> + *iA*<sup>2</sup> where *A*<sup>1</sup> and *A*<sup>2</sup> are real matrices. Then *A*¯ = *A*<sup>1</sup> − *iA*<sup>2</sup> and *A*<sup>∗</sup> = *A*- <sup>1</sup> − *iA*- <sup>2</sup>. In the above example,

$$\begin{aligned} A &= \begin{bmatrix} 1 & 0 & 3 \\ 0 & 5 & 1 \\ 2 & 0 & 3 \end{bmatrix} + i \begin{bmatrix} 1 & 2 & -1 \\ 4 & 0 & 1 \\ -1 & 1 & 1 \end{bmatrix} \Rightarrow \\\ A\_1 &= \begin{bmatrix} 1 & 0 & 3 \\ 0 & 5 & 1 \\ 2 & 0 & 3 \end{bmatrix}, \ A\_2 = \begin{bmatrix} 1 & 2 & -1 \\ 4 & 0 & 1 \\ -1 & 1 & 1 \end{bmatrix}. \end{aligned}$$

*A Hermitian Matrix If A* = *A*∗, *then A is called a Hermitian matrix*. In the representation *A* = *A*<sup>1</sup> + *iA*2, if *A* = *A*∗, then *A*<sup>1</sup> = *A*- <sup>1</sup> or *A*<sup>1</sup> is real symmetric, and *A*<sup>2</sup> = −*A*- <sup>2</sup> or *A*<sup>2</sup> is real skew symmetric. Note that when *X* is an *n* × 1 vector, *X*∗*X* is real. Let

$$\begin{aligned} X &= \begin{bmatrix} 2 \\ 3+i \\ 2i \end{bmatrix} \Rightarrow \bar{X} = \begin{bmatrix} 2 \\ 3-i \\ -2i \end{bmatrix}, \; X^\* = [2\ 3 - i \ -2i], \\\ X^\*X &= [2\ 3 - i \ -2i] \begin{bmatrix} 2 \\ 3+i \\ 2i \end{bmatrix} = (2^2 + 0^2) + (3^2 + 1^2) + (0^2 + (-2)^2) = 18. \end{aligned}$$

Consider the eigenvalues of a Hermitian matrix *A*, which are the solutions of |*A*−*λI* | = 0. As in the real case, *λ* may be real, positive, negative, zero, irrational or complex. Then, for *X* = *O,*

$$AX = \lambda X \Rightarrow \tag{i}$$

$$X^\*A^\* = \bar{\lambda}X^\* \tag{il}$$

by taking the conjugate transpose. Since *λ* is scalar, its conjugate transpose is *λ*¯. Premultiply *(i)* by *X*<sup>∗</sup> and post-multiply *(ii)* by *X*. Then for *X* = *O*, we have

$$X^\*AX = \lambda X^\*X\tag{ii}$$

$$X^\*A^\*X = \bar{\lambda}X^\*X. \tag{i\nu}$$

When *A* is Hermitian, *A* = *A*∗, and so, the left-hand sides of *(iii)* and *(iv)* are the same. On subtracting (*iv*) from (*iii*), we have 0 = *(λ* − *λ)X*¯ <sup>∗</sup>*X* where *X*∗*X* is real and positive, and hence *λ* − *λ*¯ = 0, which means that the imaginary part is zero or *λ* is real. If *A* is skew Hermitian, then we end up with *λ* + *λ*¯ = 0 ⇒ *λ* is zero or purely imaginary. The above procedure also holds for matrices in the real domain. Thus, in addition to properties (13) and (14), we have the following properties:

(19) *The eigenvalues of a Hermitian matrix are real; however, the converse need not be true*;

(20) *The eigenvalues of a skew Hermitian matrix are zero or purely imaginary; however, the converse need not be true*.

#### **1.5. Definiteness of Matrices, Quadratic and Hermitian Forms**

Let *X* be an *n* × 1 vector of real scalar variables *x*1*,...,xn* so that *X*- = *(x*1*,...,xn)*. Let *A* = *(aij )* be a real *n* × *n* matrix. Then, all the elements of the *quadratic form*, *u* = *X*- *AX*, are of degree 2. One can always write *A* as an equivalent symmetric matrix when *A* is the matrix in a quadratic form. Hence, without any loss of generality, we may assume *A* = *A*- (symmetric) when *A* appears in a quadratic form *u* = *X*- *AX*.

*Definiteness of a quadratic form and definiteness of a matrix* are only defined for *A* = *A*- (symmetric) in the real domain and for *A* = *A*<sup>∗</sup> (Hermitian) in the complex domain. Hence, the basic starting condition is that either *A* = *A* or *A* = *A*∗. If for *all* non-null *X*, that is, *X* = *O*, *X*- *AX >* 0*, A* = *A*- , *then A is said to be a positive definite matrix and X*- *AX >* 0 *is called a positive definite quadratic form*. If for *all* non-null *X*, *X*∗*AX >* 0*, A* = *A*∗, *then A is referred to as a Hermitian positive definite matrix and the corresponding Hermitian form X*∗*AX >* 0, *as Hermitian positive definite*. Similarly, if for all non-null *X*, *X*- *AX* ≥ 0*, X*∗*AX* ≥ 0, then *A* is positive semi-definite or Hermitian positive semi-definite. If for all non-null *X*, *X*- *AX <* 0*, X*∗*AX <* 0, then *A* is negative definite and if *X*- *AX* ≤ 0*, X*∗*AX* ≤ 0, then *A* is negative semi-definite. The standard notations being utilized are as follows:

*A>O* (*A* and *X*- *AX >* 0 are *real positive definite*; (*O* is a capital o and not zero) *A* ≥ *O* (*A* and *X*- *AX* ≥ 0 are *positive semi-definite*) *A<O* (*A* and *X*- *AX <* 0 are *negative definite*) *A* ≤ *O* (*A* and *X*- *AX* ≤ 0 are *negative semi-definite*).

All other matrices, which do no belong to any of those four categories, are called *indefinite matrices*. That is, for example, *A* is such that for some *X, X*- *AX >* 0, and for some other *X*, *X*- *AX <* 0, then *A* is an indefinite matrix. The corresponding Hermitian cases are:

> *A > O,X*∗*AX >* 0 (Hermitian positive definite) *A* ≥ *O,X*∗*AX* ≥ 0 (Hermitian positive semi-definite) *A < O,X*∗*AX <* 0 (Hermitian negative definite) *A* ≤ *O,X*∗*AX* ≤ 0 (Hermitian negative semi-definite). (1.5.1)

In all other cases, the matrix *A* and the Hermitian form *X*∗*AX* are indefinite. Certain conditions for the definiteness of *A* = *A*or *A* = *A*<sup>∗</sup> are the following:

(1) *All the eigenvalues of A are positive* ⇔ *A>O*; *all eigenvalues are greater than or equal to zero* ⇔ *A* ≥ *O*; *all eigenvalues are negative* ⇔ *A<O*; *all eigenvalues are* ≤ 0 ⇔ *A* ≤ *O*; *all other matrices A* = *A or A* = *A*<sup>∗</sup> *for which some eigenvalues are positive and some others are negative are indefinite.*

(2) *A* = *A or A* = *A*<sup>∗</sup> *and all the leading minors of A are positive (leading minors are determinants of the leading sub-matrices, the r*-*th leading sub-matrix being obtained by deleting all rows and columns from the (r* + 1*)*-*th onward), then A>O*; *if the leading minors are* ≥ 0, *then A* ≥ *O*; *if all the odd order minors are negative and all the even order minors are positive, then A<O*; *if the odd order minors are* ≤ 0 *and the even order* *minors are* ≥ 0, *then A* ≤ *O*; *all other matrices are indefinite*. If *A* = *A* or *A* = *A*∗, then no definiteness can be defined in terms of the eigenvalues or leading minors. Let

$$A = \begin{bmatrix} 2 & 0\\ 0 & s \end{bmatrix}.$$

Note that *A* is real symmetric as well as Hermitian. Since *X*- *AX* <sup>=</sup> <sup>2</sup>*x*<sup>2</sup> <sup>1</sup> <sup>+</sup> <sup>5</sup>*x*<sup>2</sup> <sup>2</sup> *>* 0 for all real *x*<sup>1</sup> and *x*<sup>2</sup> as long as both *x*<sup>1</sup> and *x*<sup>2</sup> are not both equal to zero, *A>O* (positive definite). *X*∗*AX* = 2|*x*1| <sup>2</sup> <sup>+</sup> <sup>5</sup>|*x*2<sup>|</sup> <sup>2</sup> <sup>=</sup> <sup>2</sup>[ *(x*<sup>2</sup> <sup>11</sup> <sup>+</sup> *<sup>x</sup>*<sup>2</sup> <sup>12</sup>*)*] <sup>2</sup> <sup>+</sup> <sup>5</sup>[ *(x*<sup>2</sup> <sup>21</sup> <sup>+</sup> *<sup>x</sup>*<sup>2</sup> <sup>22</sup>*)*] <sup>2</sup> *>* 0 for all *x*11*, x*12*, x*21*, x*22*,* as long as all are not simultaneously equal to zero, where *x*<sup>1</sup> = *x*<sup>11</sup> + *ix*12*, x*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*<sup>21</sup> <sup>+</sup> *ix*<sup>22</sup> with *<sup>x</sup>*11*, x*12*, x*21*, x*<sup>22</sup> being real and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. Consider

$$B = \begin{bmatrix} -1 & 0\\ 0 & -4 \end{bmatrix}, \ C = \begin{bmatrix} 2 & 0\\ 0 & -6 \end{bmatrix}.$$

Then, *B<O* and *C* is indefinite. Consider the following symmetric matrices:

$$A\_1 = \begin{bmatrix} 5 & 2 \\ 2 & 4 \end{bmatrix}, \ A\_2 = \begin{bmatrix} 2 & 2 \\ 2 & 1 \end{bmatrix}, \ A\_3 = \begin{bmatrix} -2 & 1 \\ 1 & -5 \end{bmatrix}.$$

The leading minors of *A*<sup>1</sup> are 5 *>* 0*,* 5 2 2 4 <sup>=</sup> <sup>16</sup> *<sup>&</sup>gt;* 0. The leading minors of *<sup>A</sup>*<sup>2</sup> are 2 *>* 0*,* 2 2 2 1 = −<sup>2</sup> *<sup>&</sup>lt;* 0. The leading minors of *<sup>A</sup>*<sup>3</sup> are <sup>−</sup><sup>2</sup> *<sup>&</sup>lt;* <sup>0</sup>*,* −2 1 1 −5 <sup>=</sup> <sup>9</sup> *<sup>&</sup>gt;* 0. Hence *A*<sup>1</sup> *> O, A*<sup>3</sup> *< O* and *A*<sup>2</sup> is indefinite. The following results will be useful when reducing a quadratic form or Hermitian form to its canonical form.

(3) *For every real A* = *A*- *(symmetric), there exists an orthonormal matrix Q*, *QQ*- = *I, Q*- *Q* = *I such that Q*- *AQ* = diag*(λ*1*,...,λn) where λ*1*,...,λn are the eigenvalues of the n* × *n matrix A and* diag*(. . .) denotes a diagonal matrix*. In this case, a real quadratic form will reduce to the following linear combination:

$$X^\prime AX = Y^\prime \mathcal{Q}^\prime A \mathcal{Q} Y = Y^\prime \text{diag}(\lambda\_1, \dots, \lambda\_n) Y = \lambda\_1 \mathbf{y}\_1^2 + \dots + \lambda\_n \mathbf{y}\_n^2, \ Y = \mathcal{Q}^\prime X. \tag{1.5.2}$$

(4): *For every Hermitian matrix A* = *A*∗, *there exists a unitary matrix U*, *U*∗*U* = *I,UU*<sup>∗</sup> = *I such that*

$$AX^\*AX = Y^\* \text{diag}(\lambda\_1, \dots, \lambda\_n)Y = \lambda\_1|\mathbf{y}\_1|^2 + \dots + \lambda\_n|\mathbf{y}\_n|^2, \ Y = U^\*X. \tag{1.5.3}$$

When *A>O* (real positive definite or Hermitian positive definite) then all the *λj* 's are real and positive. Then, *X*- *AX* and *X*∗*AX* are strictly positive.

Let *A* and *B* be *n* × *n* matrices. If *AB* = *BA*, in which case we say that *A* and *B* commute, then both *A* and *B* can be simultaneously reduced to their canonical forms (diagonal forms with the diagonal elements being the eigenvalues) with the same orthonormal or unitary matrix *P*, *P P*- = *I, P*- *P* = *I* if *P* is real and *P P*<sup>∗</sup> = *I, P*∗*P* = *I* if complex, such that *P*- *AP* = diag*(λ*1*,...,λn)* and *P*- *BP* = diag*(μ*1*,...,μn)* where *λ*1*,...,λn* are the eigenvalues of *A* and *μ*1*,...,μn* are the eigenvalues of *B*. In the complex case, *P*∗*AP* = diag*(λ*1*,...,λn)* and *P*∗*BP* = diag*(μ*1*,...,μn)*. Observe that the eigenvalues of Hermitian matrices are real.

#### **1.5.1. Singular value decomposition**

For an *n* × *n* symmetric matrix *A* = *A*- , we have stated that there exists an *n* × *n* orthonormal matrix *P*, *P P*- = *In, P*- *P* = *In* such that *P*- *AP* = *D* = diag*(λ*1*,...,λn)*, where *λ*1*,...,λn* are the eigenvalues of *A*. If a square matrix *A* is not symmetric, there exists a nonsingular matrix *<sup>Q</sup>* such that *<sup>Q</sup>*−1*AQ* <sup>=</sup> *<sup>D</sup>* <sup>=</sup> diag*(λ*1*,...,λn)* when the rank of *A* is *n*. If the rank is less than *n*, then we may be able to obtain a nonsingular *Q* such that the above representation holds; however, this is not always possible. If *A* is a *p* × *q* rectangular matrix for *p* = *q* or if *p* = *q* and *A* = *A*- , then can we find two orthonormal matrices *U* and *V* such that *A* = *U Λ O O O V* where *Λ* = diag*(λ*1*,...,λk), UU*- = *Ip, U*- *U* = *Ip, VV* - = *Iq , V* - *V* = *Iq* and *k* is the rank of *A*. This representation is equivalent to the following:

$$A = U \begin{bmatrix} A & O \\ O & O \end{bmatrix} V' = U\_{\text{(I)}} A V\_{\text{(I)}}' \tag{i}$$

where *U(*1*)* = [*U*1*,...,Uk*]*, V(*1*)* = [*V*1*,...,Vk*], *Uj* being the normalized eigenvector of *AA* corresponding to the eigenvalue *λ*<sup>2</sup> *<sup>j</sup>* and *Vj* , the normalized eigenvector of *A*- *A* corresponding to the eigenvalue *λ*<sup>2</sup> *<sup>j</sup>* . The representation given in *(i)* is known as the *singular value decomposition of A* and *λ*<sup>1</sup> *>* 0*,...,λk >* 0 are called the *singular values of A*. Then, we have

$$AA'=U\begin{bmatrix} \Lambda^2 & O \\ O & O \end{bmatrix} U'=U\_{\text{(l)}}\Lambda^2 U'\_{\text{(l)}},\ A'A = V\begin{bmatrix} \Lambda^2 & O \\ O & O \end{bmatrix} V' = V\_{\text{(l)}}\Lambda^2 V'\_{\text{(l)}}.\tag{ii}$$

Thus, the procedure is the following: If *p* ≤ *q*, compute the nonzero eigenvalues of *AA*- , otherwise compute the nonzero eigenvalues of *A*- *A*. Denote them by *λ*<sup>2</sup> 1*,...,λ*<sup>2</sup> *<sup>k</sup>* where *k* is the rank of *A*. Construct the following normalized eigenvectors *U*1*,...,Uk* from *AA*- . This gives *U(*1*)* = [*U*1*,...,Uk*]. Then, by using the same eigenvalues *<sup>λ</sup>*<sup>2</sup> *<sup>j</sup> , j* = 1*,...,k*, determine the normalized eigenvectors, *V*1*,...,Vk,* from *A*- *A*, and let *V(*1*)* = [*V*1*,...,Vk*]. Let us verify the above statements with the help of an example. Let

$$A = \begin{bmatrix} 1 & -1 & 1 \\ 1 & 1 & 0 \end{bmatrix}.$$

Then,

$$AA'=\begin{bmatrix} 3 & 0\\ 0 & 2 \end{bmatrix},\ A'A=\begin{bmatrix} 2 & 0 & 1\\ 0 & 2 & -1\\ 1 & -1 & 1 \end{bmatrix}.$$

The eigenvalues of *AA* are *λ*<sup>2</sup> <sup>1</sup> <sup>=</sup> 3 and *<sup>λ</sup>*<sup>2</sup> <sup>2</sup> = 2. The corresponding normalized eigenvectors of *AA*are *U*<sup>1</sup> and *U*2, where

$$U\_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \ U\_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}, \text{ so that } U\_{(1)} = [U\_1, U\_2] = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.$$

Now, by using *λ*<sup>2</sup> <sup>1</sup> <sup>=</sup> 3 and *<sup>λ</sup>*<sup>2</sup> <sup>2</sup> = 2, compute the normalized eigenvectors from *A*- *A*. They are:

$$V\_1 = \frac{1}{\sqrt{3}} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix}, \ V\_2 = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}, \text{ so that } \ V'\_{(1)} = \begin{bmatrix} \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 \end{bmatrix}.$$

Then *Λ* = diag*(* <sup>√</sup>3*,* <sup>√</sup>2*)*. Also,

$$U\_{\rm (I)} A V\_{\rm (I)}' = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} \sqrt{3} & 0 \\ 0 & \sqrt{2} \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 \end{bmatrix} = \begin{bmatrix} 1 & -1 & 1 \\ 1 & 1 & 0 \end{bmatrix} = A \dots$$

This establishes the result. Observe that

$$\begin{aligned} AA' &= [U\_{(\mathrm{I})} A V\_{(\mathrm{I})}'] [U\_{(\mathrm{I})} A V\_{(\mathrm{I})}'] = U\_{(\mathrm{I})} A^2 U\_{(\mathrm{I})}' \\ A'A &= [U\_{(\mathrm{I})} A V\_{(\mathrm{I})}'] [U\_{(\mathrm{I})} A V\_{(\mathrm{I})}'] = V\_{(\mathrm{I})} A^2 V\_{(\mathrm{I})}' \\ A^2 &= \mathrm{diag}(\lambda\_1^2, \lambda\_2^2). \end{aligned}$$

#### **1.6. Wedge Product of Differentials and Jacobians**

If *y* = *f (x)* is an explicit function of *x*, where *x* and *y* are real scalar variables, then we refer to *x* as the independent variable and to *y* as the dependent variable. In the present context, "independent" means that values for *x* are preassigned and the corresponding values of *y* are evaluated from the formula *y* = *f (x)*. The standard notations for small increment in *x* and the corresponding increment in *y* are *Δx* and *Δy,* respectively. By convention, *Δx >* 0 and *Δy* can be positive, negative or zero depending upon the function *f* . If *Δx* goes to zero, then the limit is zero. However, if *Δx* goes to zero in the presence of the ratio *Δy Δx* , then we have a different situation. Consider the identity

$$
\Delta \mathbf{y} \equiv \left(\frac{\Delta \mathbf{y}}{\Delta \mathbf{x}}\right) \Delta \mathbf{x} \implies \mathbf{d} \mathbf{y} = A \,\mathrm{d} \mathbf{x}, \; A = \frac{\mathrm{d} \mathbf{y}}{\mathrm{d} \mathbf{x}}.\tag{1.6.1}
$$

This identity can always be written due to our convention *Δx >* 0. Consider *Δx* → 0. If *Δy Δx* attains a limit at some stage as *Δx* <sup>→</sup> 0, let us denote it by *<sup>A</sup>* <sup>=</sup> lim*Δx*→<sup>0</sup> *Δy Δx* , then the value of *Δx* at that stage is the differential of *x*, namely d*x*, and the corresponding *Δy* is <sup>d</sup>*<sup>y</sup>* and *<sup>A</sup>* is the ratio of the differentials *<sup>A</sup>* <sup>=</sup> <sup>d</sup>*<sup>y</sup>* <sup>d</sup>*<sup>x</sup>* . If *x*1*,...,xk* are independent variables and if *y* = *f (x*1*,...,xk),* then by convention *Δx*<sup>1</sup> *>* 0*, . . . , Δxk >* 0. Thus, in light of (1.6.1), we have

$$\mathbf{d}\mathbf{y} = \frac{\partial f}{\partial \mathbf{x}\_1} \mathbf{d}\mathbf{x}\_1 + \dots + \frac{\partial f}{\partial \mathbf{x}\_k} \mathbf{d}\mathbf{x}\_k \tag{1.6.2}$$

where *∂f ∂xj* is the partial derivative of *f* with respect to *xj* or the derivative of *f* with respect to *xj* , keeping all other variables fixed.

*Wedge Product of Differentials* Let d*x* and d*y* be differentials of the real scalar variables *x* and *y*. Then the wedge product or skew symmetric product of d*x* and d*y* is denoted by d*x* ∧ d*y* and is defined as

$$\mathbf{dx} \wedge \mathbf{dy} = -\mathbf{dy} \wedge \mathbf{dx} \implies \mathbf{dx} \wedge \mathbf{dx} = 0 \text{ and } \mathbf{dy} \wedge \mathbf{dy} = 0. \tag{1.6.3}$$

This definition indicates that higher order wedge products involving the same differential are equal to zero. Letting

$$\mathbf{y}\_1 = f\_1(\mathbf{x}\_1, \mathbf{x}\_2) \text{ and } \mathbf{y}\_2 = f\_2(\mathbf{x}\_1, \mathbf{x}\_2),$$

it follows from the basic definitions that

$$\mathbf{d}\mathbf{y}\_{\mathrm{l}} = \frac{\partial f\_{\mathrm{l}}}{\partial \mathbf{x}\_{\mathrm{l}}} \mathrm{d}\mathbf{x}\_{\mathrm{l}} + \frac{\partial f\_{\mathrm{l}}}{\partial \mathbf{x}\_{\mathrm{2}}} \mathrm{d}\mathbf{x}\_{\mathrm{2}} \text{ and } \mathrm{d}\mathbf{y}\_{\mathrm{2}} = \frac{\partial f\_{\mathrm{2}}}{\partial \mathbf{x}\_{\mathrm{l}}} \mathrm{d}\mathbf{x}\_{\mathrm{l}} + \frac{\partial f\_{\mathrm{2}}}{\partial \mathbf{x}\_{\mathrm{2}}} \mathrm{d}\mathbf{x}\_{\mathrm{2}}.$$

By taking the wedge product and using the properties specified in (1.6.3), that is, d*x*<sup>1</sup> ∧ d*x*<sup>1</sup> = 0*,* d*x*<sup>2</sup> ∧ d*x*<sup>2</sup> = 0*,* d*x*<sup>2</sup> ∧ d*x*<sup>1</sup> = −d*x*<sup>1</sup> ∧ d*x*2*,* we have

$$\begin{split} \mathbf{d}\mathbf{y}\_{1} \wedge \mathbf{d}\mathbf{y}\_{2} &= \begin{bmatrix} \frac{\partial f\_{1}}{\partial x\_{1}} \frac{\partial f\_{2}}{\partial x\_{2}} - \frac{\partial f\_{1}}{\partial x\_{2}} \frac{\partial f\_{2}}{\partial x\_{1}} \end{bmatrix} \mathbf{d}\mathbf{x}\_{1} \wedge \mathbf{d}\mathbf{x}\_{2} \\ &= \begin{vmatrix} \frac{\partial f\_{1}}{\partial x\_{1}} & \frac{\partial f\_{1}}{\partial x\_{2}} \\ \frac{\partial f\_{2}}{\partial x\_{1}} & \frac{\partial f\_{2}}{\partial x\_{2}} \end{vmatrix} \mathbf{d}\mathbf{x}\_{1} \wedge \mathbf{d}\mathbf{x}\_{2} \,. \end{split}$$

In the general case we have the following corresponding result:

$$\mathbf{d}\mathbf{y}\_1 \wedge \dots \wedge \mathbf{d}\mathbf{y}\_k = \begin{vmatrix} \frac{\partial f\_1}{\partial x\_1} & \cdots & \frac{\partial f\_1}{\partial x\_k} \\ \vdots & \ddots & \vdots \\ \frac{\partial f\_k}{\partial x\_1} & \cdots & \frac{\partial f\_k}{\partial x\_k} \end{vmatrix} \mathbf{d}\mathbf{x}\_1 \wedge \dots \wedge \mathbf{d}\mathbf{x}\_k \tag{1.6.4}$$
 
$$\mathbf{d}Y = J \, \mathbf{d}X \Rightarrow \mathbf{d}X = \frac{1}{J} \, \mathbf{d}Y \text{ if } J \neq \mathbf{0}$$

where d*<sup>X</sup>* <sup>=</sup> <sup>d</sup>*x*<sup>1</sup> <sup>∧</sup> *...* <sup>∧</sup> <sup>d</sup>*xk*, d*<sup>Y</sup>* <sup>=</sup> <sup>d</sup>*y*<sup>1</sup> <sup>∧</sup> *...* <sup>∧</sup> <sup>d</sup>*yk* and *<sup>J</sup>* = |*( ∂fi ∂xj )*| = the determinant of the matrix of partial derivatives where the *(i, j )*-th element is the partial derivative of *fi* with respect to *xj* . In d*X* and d*Y,* the individual real scalar variables can be taken in any order to start with. However, for each interchange of variables, the result is to be multiplied by −1.

**Example 1.6.1.** Consider the transformation *<sup>x</sup>*<sup>1</sup> <sup>=</sup> *<sup>r</sup>* cos2 *θ, x*<sup>2</sup> <sup>=</sup> *<sup>r</sup>* sin<sup>2</sup> *θ ,* <sup>0</sup> <sup>≤</sup> *r <* <sup>∞</sup>*,* <sup>0</sup> <sup>≤</sup> *<sup>θ</sup>* <sup>≤</sup> *<sup>π</sup>* <sup>2</sup> *, x*<sup>1</sup> ≥ 0*, x*<sup>2</sup> ≥ 0. Determine the relationship between d*x*<sup>1</sup> ∧ d*x*<sup>2</sup> and d*r* ∧ d*θ*.

**Solution 1.6.1.** Taking partial derivatives, we have

$$\begin{aligned} \frac{\partial \mathbf{x}\_1}{\partial r} &= \cos^2 \theta, & \frac{\partial \mathbf{x}\_1}{\partial \theta} &= -2r \cos \theta \sin \theta, \\\frac{\partial \mathbf{x}\_2}{\partial r} &= \sin^2 \theta, & \frac{\partial \mathbf{x}\_2}{\partial \theta} &= 2r \cos \theta \sin \theta. \end{aligned}$$

Then, the determinant of the matrix of partial derivatives is given by

$$
\begin{vmatrix}
\frac{\partial x\_1}{\partial r} & \frac{\partial x\_1}{\partial \theta} \\
\frac{\partial x\_2}{\partial r} & \frac{\partial x\_2}{\partial \theta}
\end{vmatrix} = \begin{vmatrix}
\cos^2 \theta & -2r \cos \theta \sin \theta \\
\sin^2 \theta & 2r \cos \theta \sin \theta
\end{vmatrix} = 2r \cos \theta \sin \theta
$$

since cos2 *<sup>θ</sup>* <sup>+</sup> sin2 *<sup>θ</sup>* <sup>=</sup> 1. Hence,

d*x*<sup>1</sup> ∧ d*x*<sup>2</sup> = 2*r* cos *θ* sin *θ* d*r* ∧ d*θ, J* = 2*r* cos *θ* sin *θ.*

We may also establish this result by direct evaluation.

$$\begin{aligned} \mathbf{dx}\_1 &= \frac{\partial \mathbf{x}\_1}{\partial r} \mathbf{d}r + \frac{\partial \mathbf{x}\_1}{\partial \theta} \mathbf{d}\theta = \cos^2 \theta \,\,\mathrm{d}r - 2r \cos \theta \,\,\sin \theta \,\,\mathrm{d}\theta, \\\ \mathbf{dx}\_2 &= \frac{\partial \mathbf{x}\_2}{\partial r} \mathbf{d}r + \frac{\partial \mathbf{x}\_2}{\partial \theta} \mathbf{d}\theta = \sin^2 \theta \,\,\mathrm{d}r + 2r \cos \theta \,\,\sin \theta \,\,\mathrm{d}\theta, \end{aligned}$$

$$\begin{split} \mathbf{dx}\_1 \wedge \mathbf{dx}\_2 &= \cos^2 \theta \sin^2 \theta \,\mathrm{d}r \wedge \mathrm{d}r + \cos^2 \theta (2r \cos \theta \sin \theta) \,\mathrm{d}r \wedge \mathrm{d}\theta \\ &- \sin^2 \theta (2r \cos \theta \sin \theta) \,\mathrm{d}\theta \wedge \mathrm{d}r - (2r \cos \theta \sin \theta)^2 \mathrm{d}\theta \wedge \mathrm{d}\theta \\ &= 2r \cos \theta \sin \theta \,\mathrm{[}\cos^2 \theta \,\mathrm{d}r \wedge \mathrm{d}\theta - \sin^2 \theta \,\mathrm{d}\theta \wedge \mathrm{d}r], \, [\mathrm{d}r \wedge \mathrm{d}r = 0, \, \mathrm{d}\theta \wedge \mathrm{d}\theta = 0] \\ &= 2r \cos \theta \sin \theta \,\mathrm{[}\cos^2 \theta + \sin^2 \theta \,\mathrm{d}r \wedge \mathrm{d}\theta, \, [\mathrm{d}\theta \wedge \mathrm{d}r = -\mathrm{d}r \wedge \mathrm{d}\theta] \\ &= 2r \cos \theta \sin \theta \,\mathrm{d}r \wedge \mathrm{d}\theta. \end{split}$$

*Linear Transformation* Consider the linear transformation *Y* = *AX* where

$$Y = \begin{bmatrix} \mathbf{y}\_1 \\ \vdots \\ \mathbf{y}\_p \end{bmatrix}, \ X = \begin{bmatrix} x\_1 \\ \vdots \\ x\_p \end{bmatrix}, \ A = \begin{bmatrix} a\_{11} & \dots & a\_{1p} \\ \vdots & \ddots & \vdots \\ a\_{p1} & \dots & a\_{pp} \end{bmatrix}.$$

Then, *∂yi ∂xj* <sup>=</sup> *aij* <sup>⇒</sup> *( ∂yi ∂xj )* = *(aij )* = *A*. Then d*Y* = |*A*| d*X* or *J* = |*A*|. Hence, the following result:

**Theorem 1.6.1.** *Let X and Y be p* × 1 *vectors of distinct real variables and A* = *(aij ) be a constant nonsingular matrix. Then, the transformation Y* = *AX is one to one and*

$$Y = AX, \ |A| \neq 0 \implies \mathrm{d}Y = |A| \,\mathrm{d}X. \tag{1.6.5}$$

Let us consider the complex case. Let *X*˜ = *X*<sup>1</sup> + *iX*<sup>2</sup> where a tilde indicates that the matrix is in the complex domain, *X*<sup>1</sup> and *X*<sup>2</sup> are real *p* × 1 vectors if *X*˜ is *p* × 1, and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. Then, the wedge product d*X*˜ is defined as d*X*˜ <sup>=</sup> <sup>d</sup>*X*<sup>1</sup> <sup>∧</sup> <sup>d</sup>*X*2. This is the general definition in the complex case whatever be the order of the matrix. If *Z*˜ is *m* × *n* and if *Z*˜ = *Z*<sup>1</sup> + *iZ*<sup>2</sup> where *Z*<sup>1</sup> and *Z*<sup>2</sup> are *m* × *n* and real, then d*Z*˜ = d*Z*<sup>1</sup> ∧ d*Z*2. Letting the constant *p* × *p* matrix *A* = *A*<sup>1</sup> + *iA*<sup>2</sup> where *A*<sup>1</sup> and *A*<sup>2</sup> are real and *p* × *p*, and letting *Y*˜ = *Y*<sup>1</sup> + *iY*<sup>2</sup> be *p* × 1 where *Y*<sup>1</sup> and *Y*<sup>2</sup> are real and *p* × 1, we have

$$\begin{aligned} \bar{Y} &= A\bar{X} \Rightarrow Y\_1 + iY\_2 = [A\_1 + iA\_2][X\_1 + iX\_2] \\ &= [A\_1X\_1 - A\_2X\_2] + i[A\_1X\_2 + A\_2X\_1] \Rightarrow \\ Y\_1 &= A\_1X\_1 - A\_2X\_2, \quad Y\_2 = A\_1X\_2 + A\_2X\_1 \Rightarrow \\ \begin{bmatrix} Y\_1 \\ Y\_2 \end{bmatrix} &= \begin{bmatrix} A\_1 & -A\_2 \\ A\_2 & A\_1 \end{bmatrix} \begin{bmatrix} X\_1 \\ X\_2 \end{bmatrix}. \end{aligned} \tag{i)}$$

Now, applying Result 1.6.1 on *(i)*, it follows that

$$\mathrm{d}Y\_1 \wedge \mathrm{d}Y\_2 = \det\begin{bmatrix} A\_1 & -A\_2 \\ A\_2 & A\_1 \end{bmatrix} \mathrm{d}X\_1 \wedge \mathrm{d}X\_2. \tag{ii}$$

#### Mathematical Preliminaries

That is,

$$\mathrm{d}\tilde{Y} = \det\begin{bmatrix} A\_1 & -A\_2 \\ A\_2 & A\_1 \end{bmatrix} \mathrm{d}\tilde{X} \implies \mathrm{d}\tilde{Y} = J \,\mathrm{d}\tilde{X} \tag{iii}$$

where the Jacobian can be shown to be the absolute value of the determinant of *A*. If the determinant of *A* is denoted by det*(A)* and its absolute value, by |det*(A)*|, and if det*(A)* <sup>=</sup> *<sup>a</sup>* <sup>+</sup> *ib* with *a, b* real and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)* then the absolute value of the determinant is +√*(a* <sup>+</sup> *ib)(a* <sup>−</sup> *ib)* = + *(a*<sup>2</sup> <sup>+</sup> *<sup>b</sup>*2*)* = +√[det*(A)*][det*(A*∗*)*]=+√[det*(AA*∗*)*]. It can be easily seen that the above Jacobian is given by

$$J = \det\begin{bmatrix} A\_1 & -A\_2 \\ A\_2 & A\_1 \end{bmatrix} = \det\begin{bmatrix} A\_1 & -iA\_2 \\ -iA\_2 & A\_1 \end{bmatrix}$$

$$\text{(multiplying the second row block by } -i \text{ and second column block by } i)$$

$$= \det\begin{bmatrix} A\_1 - iA\_2 & A\_1 - iA\_2 \\ -iA\_2 & A\_1 \end{bmatrix} \text{ (adding the second row block to the first row block)}$$

$$= \det(A\_1 - iA\_2)\det\begin{bmatrix} I & I \\ -iA\_2 & A\_1 \end{bmatrix} = \det(A\_1 - iA\_2)\det(A\_1 + iA\_2)$$

$$\text{(adding } (-1) \text{ times the first } p \text{ columns to the last } p \text{ columns)}$$

Then, we have the following companion result of Theorem 1.6.1.

= [det*(A)*] [det*(A*∗*)*]=[det*(AA*∗*)*]=|det*(A)*|

**Theorem 1.6a.1.** *Let X*˜ *and Y*˜ *be p* × 1 *vectors in the complex domain, and let A be a p* × *p nonsingular constant matrix that may or may not be in the complex domain. If C is a constant p* × 1 *vector, then*

$$
\tilde{Y} = A\tilde{X} + C,\ \det(A) \neq 0 \Rightarrow \mathrm{d}\tilde{Y} = |\det(A)|^2 \mathrm{d}\tilde{X} = |\det(AA^\*)|\,\mathrm{d}\tilde{X}.\tag{1.6a.1}
$$

2*.*

For the results that follow, the complex case can be handled in a similar way and hence, only the final results will be stated. For details, the reader may refer to Mathai (1997). A more general result is the following:

**Theorem 1.6.2.** *Let X and Y be real m* × *n matrices with distinct real variables as elements. Let A be a m* × *m nonsingular constant matrix and C be a m* × *n constant matrix. Then*

$$Y = AX + C, \text{ det}(A) \neq 0 \implies \text{d}Y = |A|^\pi \text{d}X. \tag{1.6.6}$$

The companion result is stated in the next theorem.

**Theorem 1.6a.2.** *Let X*˜ *and Y*˜ *be m* × *n matrices in the complex domain. Let A be a constant m* × *m nonsingular matrix that may or may not be in the complex domain, and C be a m* × *n constant matrix. Then*

$$
\tilde{Y} = A\tilde{X} + C,\ \det(A) \neq 0 \\
\Rightarrow \mathrm{d}\tilde{Y} = |\det(AA^\*)|^n \mathrm{d}\tilde{X}.\tag{1.6a.2}
$$

For proving the Theorems 1.6.2 and 1.6a.2, consider the columns of *Y* and *X*. Then apply Theorems 1.6.1 and 1.6a.1 to establish the results. If *X*, *X, Y,* ˜ *Y*˜ are as defined in Theorems 1.6.2 and 1.6a.2 and if *B* is a *n* × *n* nonsingular constant matrix, then we have the following results:

**Theorems 1.6.3 and 1.6a.3.** *Let X, X, Y,* ˜ *Y*˜ *and C be m* × *n matrices with distinct elements as previously defined, C be a constant matrix and B be a n* × *n nonsingular constant matrix. Then*

$$Y = XB + C, \text{ det}(B) \neq 0 \Rightarrow \text{d}Y = |B|^m \text{d}X \tag{1.6.7}$$

*and*

$$
\tilde{Y} = \tilde{X}B + C,\ \det(B) \neq 0 \\
\Rightarrow \mathrm{d}\tilde{Y} = |\det(BB^\*)|^m \mathrm{d}\tilde{X}.\tag{1.6a.3}
$$

For proving these results, consider the rows of *Y, Y , X,* ˜ *X*˜ and then apply Theorems 1.6.1,1.6a.1 to establish the results. Combining Theorems 1.6.2 and 1.6.3, as well as Theorems 1.6a.2 and 1.6a.3, we have the following results:

**Theorems 1.6.4 and 1.6a.4.** *Let X, X, Y,* ˜ *Y*˜ *be m* × *n matrices as previously defined, and let A be m* × *m and B be n* × *n nonsingular constant matrices. Then*

$$Y = AXB, \text{ det}(A) \neq 0, \text{ det}(B) \neq 0 \Rightarrow \text{d}Y = |A|^n |B|^m \text{d}X \tag{1.6.8}$$

*and*

$$\tilde{Y} = A\tilde{X}B,\ \det(A) \neq 0,\ \det(B) \neq 0 \Rightarrow \mathrm{d}\tilde{Y} = |\det(AA^\*)|^n |\det(BB^\*)|^m \mathrm{d}\tilde{X}.\tag{1.6a.4}$$

We now consider the case of linear transformations involving symmetric and Hermitian matrices.

**Theorems 1.6.5 and 1.6a.5.** *Let X* = *X*- *, Y* = *Y be real symmetric p*×*p matrices and let X*˜ = *X*˜ <sup>∗</sup>*, Y*˜ = *Y*˜ <sup>∗</sup> *be p* × *p Hermitian matrices. If A is a p* × *p nonsingular constant matrix, then*

$$Y = AXA', \ Y = Y', \ X = X', \ \det(A) \neq 0 \Rightarrow \mathrm{d}Y = |A|^{p+1} \mathrm{d}X \tag{1.6.9}$$

*and*

$$\tilde{Y} = A\tilde{X}A^\*, \,\det(A) \neq 0 \Rightarrow \mathrm{d}\tilde{Y} = |\det(AA^\*)|^p \mathrm{d}\tilde{X} \tag{1.6a.5}$$

*for X*˜ = *X*˜ <sup>∗</sup> *or X*˜ = −*X*˜ <sup>∗</sup>*.*

The proof involves some properties of elementary matrices and elementary transformations. Elementary matrices were introduced in Sect. 1.2.1. There are two types of basic elementary matrices, the *E* and *F* types where the *E* type is obtained by multiplying any row (column) of an identity matrix by a nonzero scalar and the *F* type is obtained by adding any row to any other row of an identity matrix. A combination of *E* and *F* type matrices results in a *G* type matrix where a constant multiple of one row of an identity matrix is added to any other row. The *G* type is not a basic elementary matrix. By performing successive pre-multiplication with *E, F* and *G* type matrices, one can reduce a nonsingular matrix to a product of the basic elementary matrices of the *E* and *F* types, observing that the *E* and *F* type elementary matrices are nonsingular. This result is needed to establish Theorems 1.6.5 and 1.6a.5. Let *A* = *E*1*E*2*F*<sup>1</sup> ··· *ErFs* for some *E*1*,...,Er* and *F*1*,...,Fs*. Then

$$AXA' = E\_1E\_2F\_1 \cdots E\_rF\_sXF'\_sE'\_r \cdots E'\_2E'\_1.$$

Let *Y*<sup>1</sup> = *FsXF*- *<sup>s</sup>* in which case the connection between d*X* and d*Y*<sup>1</sup> can be determined from *Fs*. Now, letting *Y*<sup>2</sup> = *ErY*1*E*- *<sup>r</sup>*, the connection between d*Y*<sup>2</sup> and d*Y*<sup>1</sup> can be similarly determined from *Er*. Continuing in this manner, we finally obtain the connection between d*Y* and d*X*, which will give the Jacobian as |*A*| *<sup>p</sup>*+<sup>1</sup> for the real case. In the complex case, the procedure is parallel.

We now consider two basic nonlinear transformations. In the first case, *X* is a *p* × *p* nonsingular matrix going to its inverse, that is, *<sup>Y</sup>* <sup>=</sup> *<sup>X</sup>*−1.

**Theorems 1.6.6 and 1.6a.6.** *Let X and X*˜ *be p* × *p real and complex nonsingular matrices, respectively. Let the regular inverses be denoted by <sup>Y</sup>* <sup>=</sup> *<sup>X</sup>*−<sup>1</sup> *and <sup>Y</sup>*˜ <sup>=</sup> *<sup>X</sup>*˜ <sup>−</sup>1*, respectively. Then, ignoring the sign,*

$$Y = X^{-1} \Rightarrow \mathbf{d}Y = \begin{cases} |X|^{-2p} \mathbf{d}X \text{ for a general } X\\ |X|^{-(p+1)} \mathbf{d}X \text{ for } X = X'\\ |X|^{-(p-1)} \mathbf{d}X \text{ for } X = -X' \end{cases} \tag{1.6.10}$$

*and*

$$\tilde{Y} = \tilde{X}^{-1} \Rightarrow \mathbf{d}\tilde{Y} = \begin{cases} |\det(\tilde{X}\tilde{X}^\*)|^{-2p} \text{ for a general} \tilde{X} \\ |\det(\tilde{X}\tilde{X}^\*)|^{-p} \text{ for } \tilde{X} = \tilde{X}^\* \text{ or } \tilde{X} = -\tilde{X}^\*. \end{cases} \tag{1.6a.6}$$

The proof is based on the following observations: In the real case *XX*−<sup>1</sup> <sup>=</sup> *Ip* <sup>⇒</sup> *(*d*X)X*−<sup>1</sup> <sup>+</sup> *X(*d*X*−1*)* <sup>=</sup> *<sup>O</sup>* where *(*d*X)* represents the matrix of differentials in *<sup>X</sup>*. This means that

$$(\mathbf{d}X^{-1}) = -X^{-1}(\mathbf{d}X)X^{-1}.$$

The differentials are appearing only in the matrices of differentials. Hence this situation is equivalent to the general linear transformation considered in Theorems 1.6.4 and 1.6.5 where *X* and *X*−<sup>1</sup> act as constants. The result is obtained upon taking the wedge product of differentials. The complex case is parallel.

The next results involve real positive definite matrices or Hermitian positive definite matrices that are expressible in terms of triangular matrices and the corresponding connection between the wedge product of differentials. Let *X* and *X*˜ be complex *p* × *p* real positive definite and Hermitian positive definite matrices, respectively. Let *T* = *(tij )* be a real lower triangular matrix with *tij* = 0*, i < j, tjj >* 0*, j* = 1*,... , p,* and the *tij* 's, *i* ≥ *j,* be distinct real variables. Let *T*˜ = *(t* ˜*ij )* be a lower triangular matrix with *t* ˜*ij* = 0, for *i<j* , the *t* ˜*ij* 's, *i > j,* be distinct complex variables, and *t* ˜ *jj , j* = 1*,... , p,* be positive real variables. Then, the transformations *X* = *T T* in the real case and *X*˜ = *T*˜ *T*˜ <sup>∗</sup> in the complex case can be shown to be one-to-one, which enables us to write d*X* in terms of d*T* and vice versa, uniquely, and d*X*˜ in terms of d*T ,*˜ uniquely. We first consider the real case. When *p* = 2,

$$\begin{bmatrix} \mathbf{x}\_{11} & \mathbf{x}\_{12} \\ \mathbf{x}\_{12} & \mathbf{x}\_{22} \end{bmatrix}, \ \mathbf{x}\_{11} > \mathbf{0}, \ \mathbf{x}\_{22} > \mathbf{0}, \ \mathbf{x}\_{21} = \mathbf{x}\_{12}, \ \mathbf{x}\_{11}\mathbf{x}\_{22} - \mathbf{x}\_{12}^2 > \mathbf{0}$$

due to positive definiteness of *X*, and

$$\begin{split} X &= TT' = \begin{bmatrix} t\_{11} & 0 \\ t\_{21} & t\_{22} \end{bmatrix} \begin{bmatrix} t\_{11} & t\_{21} \\ 0 & t\_{22} \end{bmatrix} = \begin{bmatrix} t\_{11}^2 & t\_{21}t\_{11} \\ t\_{21}t\_{11} & t\_{21}^2 + t\_{22}^2 \end{bmatrix} \Rightarrow \\ \frac{\partial x\_{11}}{\partial t\_{11}} &= 2t\_{11}, \ \frac{\partial x\_{11}}{\partial t\_{21}} = 0, \ \frac{\partial x\_{11}}{\partial t\_{22}} = 0 \\ \frac{\partial x\_{22}}{\partial t\_{11}} &= 0, \ \frac{\partial x\_{22}}{\partial t\_{21}} = 2t\_{21}, \ \frac{\partial x\_{22}}{\partial t\_{22}} = 2t\_{22} \\ \frac{\partial x\_{12}}{\partial t\_{11}} &= t\_{21}, \ \frac{\partial x\_{12}}{\partial t\_{21}} = t\_{11}, \ \frac{\partial x\_{12}}{\partial t\_{22}} = 0. \end{split}$$

Taking the *xij* 's in the order *x*11*, x*12*, x*<sup>22</sup> and the *tij* 's in the order *t*11*, t*21*, t*22, we form the following matrix of partial derivatives:

$$\begin{array}{ccccc} & t\_{11} & t\_{12} & t\_{22} \\ \\ \mathbf{x}\_{11} & \mathbf{2}t\_{11} & \mathbf{0} & \mathbf{0} \\ \mathbf{x}\_{21} & \mathbf{\*} & t\_{11} & \mathbf{0} \\ \mathbf{x}\_{22} & \mathbf{\*} & \mathbf{\*} & \mathbf{2}t\_{22} \\ \end{array}$$

where an asterisk indicates that an element may be present in that position; however, its value is irrelevant since the matrix is triangular and its determinant will simply be the product of its diagonal elements. It can be observed from this pattern that for a general *p,* a diagonal element will be multiplied by 2 whenever *xjj* is differentiated with respect to *tjj , j* = 1*,...,p*. Then *t*<sup>11</sup> will appear *p* times, *t*<sup>22</sup> will appear *p*−1 times, and so on, and *tpp* will appear once along the diagonal. Hence the product of the diagonal elements will be 2*<sup>p</sup> t p* 11 *t p*−1 <sup>22</sup> ··· *tpp* <sup>=</sup> <sup>2</sup>*p*{ *p <sup>j</sup>*=<sup>1</sup> *<sup>t</sup> p*+1−*j jj* }. A parallel procedure will yield the Jacobian in the complex case. Hence, the following results:

**Theorems 1.6.7 and 1.6a.7.** *Let X, X, T* ˜ *and T*˜ *be p* × *p matrices where X is real positive definite, X*˜ *is Hermitian positive definite, and T and T*˜ *are lower triangular matrices whose diagonal elements are real and positive as described above. Then the transformations X* = *T T and X*˜ = *T*˜ *T*˜ <sup>∗</sup> *are one-to-one, and*

$$\mathrm{d}X = 2^p \{ \prod\_{j=1}^p t\_{jj}^{p+1-j} \} \mathrm{d}T \tag{1.6.11}$$

*and*

$$\mathrm{d}\tilde{X} = 2^p \{ \prod\_{j=1}^p t\_{jj}^{2(p-j)+1} \} \,\mathrm{d}\tilde{T}.\tag{1.6a.7}$$

Given these introductory materials, we will explore multivariate statistical analysis from the perspective of Special Functions. As far as possible, the material in this chapter is self-contained. A few more Jacobians will be required when tackling transformations involving rectangular matrices or eigenvalue problems. These will be discussed in the respective chapters later on.

**Example 1.6.2.** Evaluate the following integrals: (1): # *<sup>X</sup>* <sup>e</sup>−*X*- *AX*d*X* where *A>O* (real positive definite) is 3 × 3 and *X* is a 3 × 1 vector of distinct real scalar variables; (2): # *<sup>X</sup>* <sup>e</sup>−tr*(AXBX*- *)* d*X* where *X* is a 2 × 3 matrix of distinct real scalar variables, *A>O* (real positive definite), is 2 × 2 and *B>O* (real positive definite) is 3 × 3, *A* and *B* being constant matrices; (3): # *X>O* <sup>e</sup>−tr*(X)*d*<sup>X</sup>* where *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>*- *> O* is a 2×2 real positive definite matrix of distinct real scalar variables.

#### **Solution 1.6.2.**

(1) Let *X*- = *(x*1*, x*2*, x*3*), A > O*. Since *A > O,* we can uniquely define *A* 1 <sup>2</sup> = *(A* 1 <sup>2</sup> *)*- . Then, write *X*- *AX* = *X*- *A* 1 <sup>2</sup>*A* 1 <sup>2</sup>*X* = *Y* - *Y, Y* = *A* 1 <sup>2</sup>*X*. It follows from Theorem 1.6.1 that d*X* = |*A*| −1 <sup>2</sup> d*Y* , and letting *Y* -= *(y*1*, y*2*, y*3*)*, we have

$$\begin{aligned} \int\_X \mathbf{e}^{-X' \mathbf{A} X} \mathbf{d}X &= |A|^{-\frac{1}{2}} \int\_Y \mathbf{e}^{-(Y' \mathbf{Y})} \mathbf{d}Y \\ &= |A|^{-\frac{1}{2}} \int\_{-\infty}^\infty \int\_{-\infty}^\infty \int\_{-\infty}^\infty \mathbf{e}^{-(y\_1^2 + y\_2^2 + y\_3^2)} \mathbf{d}y\_1 \wedge \mathbf{dy}\_2 \wedge \mathbf{dy}\_3. \end{aligned}$$

Since

$$\int\_{-\infty}^{\infty} \mathbf{e}^{-\mathbf{y}\_j^2} \mathbf{d} \mathbf{y}\_j = \sqrt{\pi}, \ j = 1, 2, 3, \dots$$

$$\int\_X \mathbf{e}^{-X'AX} \mathbf{d}X = |A|^{-\frac{1}{2}} (\sqrt{\pi})^3.$$

(2) Since *A* is a 2 × 2 positive definite matrix, there exists a 2 × 2 matrix *A* 1 <sup>2</sup> that is symmetric and positive definite. Similarly, there exists a 3×3 matrix *B* 1 <sup>2</sup> that is symmetric and positive definite. Let *Y* = *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> ⇒ d*Y* = |*A*| 3 <sup>2</sup> |*B*| 2 <sup>2</sup> d*X* or d*X* = |*A*| −3 <sup>2</sup> |*B*| <sup>−</sup>1d*Y* by Theorem 1.6.4. Moreover, given two matrices *A*<sup>1</sup> and *A*2, tr*(A*1*A*2*)* = tr*(A*2*A*1*)* even if *A*1*A*<sup>2</sup> = *A*2*A*1, as long as the products are defined. By making use of this property, we may write

$$\begin{aligned} \text{tr}(AXBX') &= \text{tr}(A^{\frac{1}{2}}A^{\frac{1}{2}}XB^{\frac{1}{2}}B^{\frac{1}{2}}X') = \text{tr}(A^{\frac{1}{2}}XB^{\frac{1}{2}}B^{\frac{1}{2}}X'A^{\frac{1}{2}})\\ &= \text{tr}[(A^{\frac{1}{2}}XB^{\frac{1}{2}})(A^{\frac{1}{2}}XB^{\frac{1}{2}})'] = \text{tr}(YY') \end{aligned}$$

where *Y* = *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> and d*Y* is given above. However, for any real matrix *Y* , whether square or rectangular, tr*(Y Y* - *)* = tr*(Y* - *Y )* = the sum of the squares of all the elements of *Y* . Thus, we have

$$\int\_X \mathbf{e}^{-\text{tr}(AXBX')} \mathbf{d}X = |A|^{-\frac{3}{2}} |B|^{-1} \int\_Y \mathbf{e}^{-\text{tr}(YY')} \mathbf{d}Y.$$

Observe that since tr*(Y Y* - *)* is the sum of squares of 6 real scalar variables, the integral over *Y* reduces to a multiple integral involving six integrals where each variable is over the entire real line. Hence,

$$\int\_{Y} \mathbf{e}^{-\text{tr}(YY')} \mathbf{d}Y = \prod\_{j=1}^{6} \int\_{-\infty}^{\infty} \mathbf{e}^{-\mathbf{y}\_{j}^{2}} \mathbf{d}\mathbf{y}\_{j} = \prod\_{j=1}^{6} (\sqrt{\pi}) = (\sqrt{\pi})^{6}.$$

Note that we have denoted the sum of the six *y*<sup>2</sup> *ij* as *<sup>y</sup>*<sup>2</sup> <sup>1</sup> +···+ *<sup>y</sup>*<sup>2</sup> <sup>6</sup> for convenience. Thus,

$$\int\_X \mathbf{e}^{-\text{tr}(AXBX')} \text{d}X = |A|^{-\frac{3}{2}} |B|^{-1} (\sqrt{\pi})^6.$$

(3) In this case, *X* is a 2 × 2 real positive definite matrix. Let *X* = *T T* where *T* is lower triangular with positive diagonal elements. Then,

$$T = \begin{bmatrix} t\_{11} & 0 \\ t\_{21} & t\_{22} \end{bmatrix}, \ t\_{11} > 0, \ t\_{22} > 0, \ T T' = \begin{bmatrix} t\_{11} & 0 \\ t\_{21} & t\_{22} \end{bmatrix} \begin{bmatrix} t\_{11} & t\_{21} \\ 0 & t\_{22} \end{bmatrix} = \begin{bmatrix} t\_{11}^2 & t\_{11}t\_{21} \\ t\_{11}t\_{21} & t\_{21}^2 + t\_{22}^2 \end{bmatrix},$$

and tr*(T T* - *)* <sup>=</sup> *<sup>t</sup>*<sup>2</sup> <sup>11</sup> <sup>+</sup> *(t*<sup>2</sup> <sup>21</sup> <sup>+</sup> *<sup>t</sup>*<sup>2</sup> <sup>22</sup>*), t*<sup>11</sup> *>* 0*, t*<sup>22</sup> *>* 0*,* −∞ *< t*<sup>21</sup> *<* ∞. From Theorem 1.6.7, the Jacobian is

$$\mathrm{d}X = 2^p \{ \prod\_{j=1}^p t\_{jj}^{p+1-j} \} \mathrm{d}T = 2^2 (t\_{11}^2 t\_{22}) \, \mathrm{d}t\_{11} \wedge \mathrm{d}t\_{21} \wedge \mathrm{d}t\_{22} \dots$$

Therefore

$$\begin{split} \int\_{X>O} \mathbf{e}^{-\text{tr}(X)} \mathbf{d}X &= \int\_{T} \mathbf{e}^{-\text{tr}(TT')} [2^2 (t\_{11}^2 t\_{22})] \mathbf{d}T \\ &= \left( \int\_{-\infty}^{\infty} \mathbf{e}^{-t\_{21}^2} \mathbf{d}t\_{21} \right) \left( \int\_{0}^{\infty} 2t\_{11}^2 \mathbf{e}^{-t\_{11}^2} \mathbf{d}t\_{11} \right) \left( \int\_{0}^{\infty} 2t\_{22} \mathbf{e}^{-t\_{22}^2} \mathbf{d}t\_{22} \right) \\ &= [\sqrt{\pi}] \left[ \Gamma(\frac{3}{2}) \right] \left[ \Gamma(1) \right] = \frac{\pi}{2} . \end{split}$$

**Example 1.6.3.** Let *A* = *A*<sup>∗</sup> *> O* be a constant 2×2 Hermitian positive definite matrix. Let *X*˜ be a 2 × 1 vector in the complex domain and *X*˜ <sup>2</sup> *> O* be a 2 × 2 Hermitian positive definite matrix. Then, evaluate the following integrals: (1): # *<sup>X</sup>*˜ <sup>e</sup>−*(X*˜ <sup>∗</sup>*AX)*˜ <sup>d</sup>*X*˜ ; (2): # *<sup>X</sup>*˜ <sup>2</sup>*>O* <sup>e</sup>−tr*(X*˜ <sup>2</sup>*)* d*X*˜ 2.

#### **Solution 1.6.3.**

(1): Since *A* = *A*<sup>∗</sup> *> O*, there exists a unique Hermitian positive definite square root *A* 1 2 . Then,

$$\begin{aligned} \tilde{X}^\* A \tilde{X} &= \tilde{X}^\* A^{\frac{1}{2}} A^{\frac{1}{2}} \tilde{X} = \tilde{Y}^\* \tilde{Y}, \\ \tilde{Y} &= A^{\frac{1}{2}} \tilde{X} \Rightarrow \mathbf{d} \tilde{X} = |\det(A)|^{-1} \mathbf{d} \tilde{Y} \end{aligned}$$

by Theorem 1.6a.1. But *Y*˜ <sup>∗</sup>*Y*˜ =|˜*y*1| <sup>2</sup> +|˜*y*2<sup>|</sup> <sup>2</sup> since *<sup>Y</sup>*˜ <sup>∗</sup> <sup>=</sup> *(y*˜<sup>∗</sup> <sup>1</sup> *, y*˜<sup>∗</sup> <sup>2</sup> *)*. Since the *y*˜*<sup>j</sup>* 's are scalar in this case, an asterisk means only the complex conjugate, the transpose being itself. However,

$$\int\_{\tilde{\mathbf{y}}\_j} \mathbf{e}^{-|\tilde{\mathbf{y}}\_j|^2} \mathbf{d}\tilde{\mathbf{y}}\_j = \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} \mathbf{e}^{-(y\_{j1}^2 + y\_{j2}^2)} \mathbf{d}\mathbf{y}\_{j1} \wedge \mathbf{d}\mathbf{y}\_{j2} = (\sqrt{\pi})^2 = \pi^2$$

where *<sup>y</sup>*˜*<sup>j</sup>* <sup>=</sup> *yj*<sup>1</sup> <sup>+</sup> *iyj*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*), yj*<sup>1</sup> and *yj*<sup>2</sup> being real. Hence

$$\begin{split} \int\_{\tilde{X}} \mathbf{e}^{-(\tilde{X}^{\ast} A \tilde{X})} \mathrm{d}\tilde{X} &= |\det(A)|^{-1} \int\_{\tilde{Y}} \mathbf{e}^{-(|\tilde{y}\_{1}|^{2} + |\tilde{y}\_{2}|^{2})} \mathrm{d}\tilde{y}\_{1} \wedge \mathbf{d}\tilde{y}\_{2} \\ &= |\det(A)|^{-1} \left( \prod\_{j=1}^{2} \int\_{\tilde{Y}\_{j}} \mathbf{e}^{-|\tilde{y}\_{j}|^{2}} \mathrm{d}\tilde{y}\_{j} \right) = |\det(A)|^{-1} \prod\_{j=1}^{2} \pi \\ &= |\det(A)|^{-1} \pi^{2} .\end{split}$$

(2): Make the transformation *X*˜ <sup>2</sup> = *T*˜ *T*˜ <sup>∗</sup> where *T*˜ is lower triangular with its diagonal elements being real and positive. That is,

$$
\tilde{T} = \begin{bmatrix} t\_{11} & 0 \\ \tilde{t}\_{21} & t\_{22} \end{bmatrix}, \ \tilde{T}\tilde{T}^\* = \begin{bmatrix} t\_{11}^2 & t\_{11}\tilde{t}\_{21} \\ t\_{11}\tilde{t}\_{21} & |\tilde{t}\_{21}|^2 + t\_{22}^2 \end{bmatrix}
$$

and the Jacobian is d*X*˜ <sup>2</sup> <sup>=</sup> <sup>2</sup>*p*{ *p <sup>j</sup>*=<sup>1</sup> *<sup>t</sup>* 2*(p*−*j )*+1 *jj* }d*T*˜ <sup>=</sup> 22*<sup>t</sup>* 3 <sup>11</sup>*t*<sup>22</sup> d*T*˜ by Theorem 1.6a.7. Hence,

$$\int\_{\tilde{X}\_2 > O} \mathbf{e}^{-\text{tr}(\tilde{X}\_2)} \mathbf{d}\tilde{X}\_2 = \int\_{\tilde{T}} \mathbf{2}^2 t\_{11}^3 t\_{22} \mathbf{e}^{-(t\_{11}^2 + t\_{22}^2 + |\tilde{t}\_{21}|^2)} \mathbf{d}t\_{11} \wedge \mathbf{d}t\_{22} \wedge \mathbf{d}\tilde{t}\_{21}.$$

But

$$\begin{split} 2\int\_{t\_{11}>0} t\_{11}^3 \mathbf{e}^{-t\_{11}^2} \mathbf{d}t\_{11} &= \int\_{u=0}^{\infty} u \, \mathbf{e}^{-u} \mathbf{d}u = 1, \\ 2\int\_{t\_{22}>0} t\_{22} \mathbf{e}^{-t\_{22}^2} \mathbf{d}t\_{22} &= \int\_{v=0}^{\infty} \mathbf{e}^{-v} \mathbf{d}v = 1, \\ \int\_{\tilde{t}\_{21}} \mathbf{e}^{-|\tilde{t}\_{21}|^2} \mathbf{d}\tilde{t}\_{21} &= \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} \mathbf{e}^{-(t\_{21}^2 + t\_{212}^2)} \mathbf{d}t\_{211} \wedge \mathbf{d}t\_{212} \\ &= \left(\int\_{-\infty}^{\infty} \mathbf{e}^{-t\_{211}^2} \mathbf{d}t\_{211}\right) \left(\int\_{-\infty}^{\infty} \mathbf{e}^{-t\_{212}^2} \mathbf{d}t\_{212}\right) \\ &= \sqrt{\pi} \sqrt{\pi} = \pi, \end{split}$$

where *t* ˜ <sup>21</sup> <sup>=</sup> *<sup>t</sup>*<sup>211</sup> <sup>+</sup> *it*212*, i* <sup>=</sup> <sup>√</sup>*(*−1*)* and *<sup>t</sup>*211*, t*<sup>212</sup> real. Thus

$$\int\_{\tilde{X}\_{2} > O} \mathbf{e}^{-\text{tr}(\tilde{X}\_{2})} \mathrm{d}\tilde{X}\_{2} = \pi.$$

#### **1.7. Differential Operators**

Let

$$X = \begin{bmatrix} x\_1 \\ \vdots \\ x\_p \end{bmatrix}, \ \frac{\partial}{\partial X} = \begin{bmatrix} \frac{\partial}{\partial x\_1} \\ \vdots \\ \frac{\partial}{\partial x\_p} \end{bmatrix}, \ \frac{\partial}{\partial X'} = \begin{bmatrix} \frac{\partial}{\partial x\_1}, \dots, \frac{\partial}{\partial x\_p} \end{bmatrix}.$$

#### Mathematical Preliminaries

where *x*1*,...,xp* are distinct real scalar variables, *<sup>∂</sup> ∂X* is the partial differential operator and *<sup>∂</sup> ∂X* is the transpose operator. Then, *<sup>∂</sup> ∂X ∂ ∂X* is the configuration of all second order partial differential operators given by

$$
\frac{\partial}{\partial X} \frac{\partial}{\partial X'} = \begin{bmatrix}
\frac{\partial^2}{\partial x\_1^2} & \frac{\partial^2}{\partial x\_1 \partial x\_2} & \cdots & \frac{\partial^2}{\partial x\_1 \partial x\_p} \\
\frac{\partial^2}{\partial x\_2 \partial x\_1} & \frac{\partial^2}{\partial x\_2^2} & \cdots & \frac{\partial^2}{\partial x\_2 \partial x\_p} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial^2}{\partial x\_p \partial x\_1} & \frac{\partial^2}{\partial x\_p \partial x\_2} & \cdots & \frac{\partial^2}{\partial x\_p^2}
\end{bmatrix}
$$

*.*

Let *f (X)* be a real-valued scalar function of *X*. Then, this operator, operating on *f* will be defined as

$$\frac{\partial}{\partial X}f = \begin{bmatrix} \frac{\partial f}{\partial x\_1} \\ \vdots \\ \frac{\partial f}{\partial x\_p} \end{bmatrix}.$$

For example, if *<sup>f</sup>* <sup>=</sup> *<sup>x</sup>*<sup>2</sup> <sup>1</sup> <sup>+</sup> *<sup>x</sup>*1*x*<sup>2</sup> <sup>+</sup> *<sup>x</sup>*<sup>3</sup> <sup>2</sup> *,* then *∂f ∂x*<sup>1</sup> <sup>=</sup> <sup>2</sup>*x*<sup>1</sup> <sup>+</sup> *<sup>x</sup>*2*, ∂f ∂x*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> <sup>+</sup> <sup>3</sup>*x*<sup>2</sup> <sup>2</sup> , and

$$
\frac{\partial f}{\partial X} = \begin{bmatrix} 2x\_1 + x\_2 \\ x\_1 + 3x\_2^2 \end{bmatrix}.
$$

Let *f* = *a*1*x*<sup>1</sup> +*a*2*x*<sup>2</sup> +···+*apxp* = *A*- *X* = *X*- *A, A*- = *(a*1*,...,ap), X*- = *(x*1*,...,xp)* where *a*1*,...,ap* are real constants and *x*1*,...,xp* are distinct real scalar variables. Then *∂f ∂xj* = *aj* and we have the following result:

**Theorem 1.7.1.** *Let A, X and f be as defined above where f* = *a*1*x*<sup>1</sup> +·+ *apxp is a linear function of X, then*

$$\frac{\partial}{\partial X}f = A.$$

Letting *f* = *X*- *<sup>X</sup>* <sup>=</sup> *<sup>x</sup>*<sup>2</sup> <sup>1</sup> +···+ *<sup>x</sup>*<sup>2</sup> *<sup>p</sup>*, *∂f ∂xj* = 2*xj* , and we have the following result.

**Theorem 1.7.2.** *Let X be a p* × 1 *vector of real scalar variables so that X*- *<sup>X</sup>* <sup>=</sup> *<sup>x</sup>*<sup>2</sup> 1 + ···+ *<sup>x</sup>*<sup>2</sup> *<sup>p</sup>. Then*

$$\frac{\partial f}{\partial X} = 2X.$$

Now, let us consider a general quadratic form *f* = *X*- *AX, A* = *A*- *,* where *X* is a *p* × 1 vector whose components are real scalar variables and *A* is a constant matrix. Then *∂f ∂xj* = *(aj*1*x*<sup>1</sup> +···+ *ajpxp)* + *(a*1*jx*<sup>1</sup> + *a*2*jx*<sup>2</sup> +···+ *apjxp)* for *j* = 1*,...,p*. Hence we have the following result:

**Theorem 1.7.3.** *Let f* = *X*- *AX be a real quadratic form where X is a p* × 1 *real vector whose components are distinct real scalar variables and A is a constant matrix. Then*

$$\frac{\partial f}{\partial X} = \begin{cases} (A + A')X \, for \, a \, general \, A\\ 2AX \, when \, A = A' \end{cases} \text{ .}$$

#### **1.7.1. Some basic applications of the vector differential operator**

Let *X* be a *p* × 1 vector with real scalar elements *x*1*,...,xp*. Let *A* = *(aij )* = *A* be a constant matrix. Consider the problem of optimizing the real quadratic form *u* = *X*- *AX*. There is no unrestricted maximum or minimum. If *A* = *A*- *> O* (positive definite), *u* can tend to +∞ and similarly, if *A* = *A*- *< O*, *u* can go to −∞. However, if we confine ourselves to the surface of a unit hypersphere or equivalently require that *X*- *X* = 1, then we can have a finite maximum and a finite minimum. Let *u*<sup>1</sup> = *X*- *AX*−*λ(X*- *X*−1*)* so that we have added zero to *u* and hence *u*<sup>1</sup> is the same as *u*, where *λ* is an arbitrary constant or a Lagrangian multiplier. Then, differentiating *u*<sup>1</sup> with respect to *x*1*,...,xp*, equating the resulting expressions to zero, and thereafter solving for critical points, is equivalent to solving the equation *∂u*<sup>1</sup> *∂X* = *O* (null) and solving this single equation. That is,

$$\frac{\partial u\_1}{\partial X} = O \Rightarrow 2AX - 2\lambda X = O \Rightarrow (A - \lambda I)X = O. \tag{i}$$

For *(i)* to have a non-null solution for *X*, the coefficient matrix *A* − *λI* has to be singular or its determinant must be zero. That is, |*A*−*λI* | = 0 and *AX* = *λX* or *λ* is an eigenvalue of *A* and *X* is the corresponding eigenvector. But

$$AX = \lambda X \Rightarrow X^\prime AX = \lambda X^\prime X = \lambda \text{ since } X^\prime X = 1. \tag{ii}$$

Hence the maximum value of *X*- *AX* corresponds to the largest eigenvalue of *A* and the minimum value of *X*- *AX*, to the smallest eigenvalue of *A*. Observe that when *A* = *A* the eigenvalues are real. Hence we have the following result:

**Theorem 1.7.4.** *Let u* = *X*- *AX, A* = *A*- *, X be a p* × 1 *vector of real scalar variables as its elements. Letting X*- *X* = 1*, then*

$$\begin{aligned} \max\_{X'X=1} \left[ X'AX \right] &= \lambda\_1 = \text{ the largest eigenvalue of } A\\ \min\_{X'X=1} \left[ X'AX \right] &= \lambda\_p = \text{ the smallest eigenvalue of } A. \end{aligned}$$

Principal Component Analysis where it is assumed that *A>O* relies on this result. This will be elaborated upon in later chapters. Now, we consider the optimization of *u* = *X*- *AX, A* = *A* subject to the condition *X*- *BX* = 1*, B* = *B*- . Take *λ* as the Lagrangian multiplier and consider *u*<sup>1</sup> = *X*- *AX* − *λ(X*- *BX* − 1*)*. Then

$$\frac{\partial u\_1}{\partial X} = O \Rightarrow AX = \lambda BX \Rightarrow |A - \lambda B| = 0. \tag{iiii}$$

Note that *X*- *AX* = *λX*- *BX* = *λ* from *(i)*. Hence, the maximum of *X*- *AX* is the largest value of *λ* satisfying (i) and the minimum of *X*- *AX* is the smallest value of *λ* satisfying (i). Note that when *<sup>B</sup>* is nonsingular, <sup>|</sup>*<sup>A</sup>* <sup>−</sup> *λB*| = <sup>0</sup> ⇒ |*AB*−<sup>1</sup> <sup>−</sup> *λI* | = 0 or *<sup>λ</sup>* is an eigenvalue of *AB*−1. Thus, this case can also be treated as an eigenvalue problem. Hence, the following result:

**Theorem 1.7.5.** *Let u* = *X*- *AX, A* = *A where the elements of X are distinct real scalar variables. Consider the problem of optimizing X*- *AX subject to the condition X*- *BX* = 1*, B* = *B*- *, where A and B are constant matrices, then*

$$\max\_{X^\prime B X = 1} \left[ X^\prime AX \right] = \lambda\_1 = \text{ largest eigenvalue of } AB^{-1}, \ |B| \neq 0$$

$$= \text{ the largest root of } |A - \lambda B| = 0;$$

$$\min\_{X^\prime B X = 1} \left[ X^\prime AX \right] = \lambda\_p = \text{ smallest eigenvalue of } AB^{-1}, \ |B| \neq 0$$

$$= \text{ the smallest root of } |A - \lambda B| = 0.$$

Now, consider the optimization of a real quadratic form subject to a linear constraint. Let *u* = *X*- *AX, A* = *A* be a quadratic form where *X* is *p* × 1. Let *B*- *X* = *X*- *B* = 1 be a constraint where *B*- = *(b*1*,...,bp), X*- = *(x*1*,...,xp)* with *b*1*,...,bp* being real constants and *x*1*,...,xp* being real distinct scalar variables. Take 2*λ* as the Lagrangian multiplier and consider *u*<sup>1</sup> = *X*- *AX* − 2*λ(X*- *B* − 1*)*. The critical points are available from the following equation

$$\frac{\partial}{\partial X}\mu\_1 = O \Rightarrow 2AX - 2\lambda B = O \Rightarrow X = \lambda A^{-1}B, \text{ for } |A| \neq 0$$

$$\Rightarrow B^\prime X = \lambda B^\prime A^{-1}B \Rightarrow \lambda = \frac{1}{B^\prime A^{-1}B}.$$

In this problem, observe that the quadratic form is unbounded even under the restriction *B*- *X* = 1 and hence there is no maximum. The only critical point corresponds to a minimum. From *AX* = *λB,* we have *X*- *AX* = *λX*- *B* = *λ*. Hence the minimum value is *λ* = [*B*- *<sup>A</sup>*−1*B*] <sup>−</sup><sup>1</sup> where it is assumed that *A* is nonsingular. Thus following result:

**Theorem 1.7.6.** *Let u* = *X*- *AX, A* = *A*- *,* |*A*| = 0*. Let B*- *X* = 1 *where B*- = *(b*1*,...,bp) be a constant vector and X is p* × 1 *vector of real distinct scalar variables. Then, the minimum of the quadratic form u, under the restriction B*- *X* = 1 *where B is a constant vector, is given by*

$$\min\_{B'X=1} \left[ X'AX \right] = \frac{1}{B'A^{-1}B}.$$

Such problems arise for instance in regression analysis and model building situations. We could have eliminated one of the variables with the linear constraint; however, the optimization would still involve all other variables, and thus not much simplification would be achieved by eliminating one variable. Hence, operating with the vector differential operator is the most convenient procedure in this case.

We will now consider the mathematical part of a general problem in prediction analysis where some variables are predicted from another set of variables. This topic is related to *Canonical Correlation Analysis*. We will consider the optimization part of the problem in this section. The problem consists in optimizing a bilinear form subject to quadratic constraints. Let *X* be a *p*×1 vector of real scalar variables *x*1*,...,xp*, and *Y* be a *q* ×1 vector of real scalar variables *y*1*,...,yq ,* where *q* need not be equal to *p*. Consider the bilinear form *u* = *X*- *AY* where *A* is a *p*×*q* rectangular constant matrix. We would like to optimize this bilinear form subject to the quadratic constraints *X*- *BX* = 1*, Y* - *CY* = 1*, B* = *B* and *C* = *C* where *B* and *C* are constant matrices. In Canonical Correlation Analysis, *B* and *C* are constant real positive definite matrices. Take *λ*<sup>1</sup> and *λ*<sup>2</sup> as Lagrangian multipliers and let *u*<sup>1</sup> = *X*- *AY* <sup>−</sup> *<sup>λ</sup>*<sup>1</sup> <sup>2</sup> *(X*- *BX* <sup>−</sup> <sup>1</sup>*)* <sup>−</sup> *<sup>λ</sup>*<sup>2</sup> <sup>2</sup> *(Y* - *CY* − 1*)*. Then

$$\frac{\partial}{\partial X}\mu\_1 = O \Rightarrow AY - \lambda\_1 BX = O \Rightarrow AY = \lambda\_1 BX$$

$$\Rightarrow X^\prime AY = \lambda\_1 X^\prime BX = \lambda\_1;\tag{i)$$

$$\frac{\partial}{\partial Y}\mu\_1 = O \Rightarrow A^\prime X - \lambda\_2 CY = O \Rightarrow A^\prime X = \lambda\_2 CY$$

$$\Rightarrow Y^\prime A^\prime X = \lambda\_2 Y^\prime CY = \lambda\_2.\tag{ii}$$

It follows from *(i)* and *(ii)* that *λ*<sup>1</sup> = *λ*<sup>2</sup> = *λ,* say. Observe that *X*- *AY* is 1 × 1 so that *X*- *AY* = *Y* - *A*- *X*. After substituting *λ* to *λ*<sup>1</sup> and *λ*2*,* we can combine equations *(i)* and *(ii)* in a single matrix equation as follows:

$$
\begin{bmatrix}
A' & -\lambda C
\end{bmatrix}
\begin{bmatrix}
X \\
Y
\end{bmatrix} = O \Rightarrow
$$

$$
\begin{vmatrix}
A' & -\lambda C
\end{vmatrix} = 0.\tag{iii}
$$

Opening up the determinant by making use of a result on partitioned matrices from Sect. 1.3, we have

$$\begin{aligned} | - \lambda B | \ | - \lambda C - A' ( - \lambda B )^{-1} A | &= 0, \ | B | \neq 0 \Rightarrow \\ | A' B^{-1} A - \lambda^2 C | &= 0. \end{aligned} \tag{\dot{\upsilon}}$$

Then *<sup>ν</sup>* <sup>=</sup> *<sup>λ</sup>*<sup>2</sup> is a root obtained from Eq. *(iv)*. We can also obtain a parallel result by opening up the determinant in *(iii)* as

$$|\langle -\lambda C | \, | -\lambda B - A(-\lambda C)^{-1} A' | = 0 \Rightarrow |AC^{-1}A' - \lambda^2 B| = 0, \, |C| \neq 0. \tag{v}$$

Hence we have the following result.

**Theorem 1.7.7.** *Let X and Y be respectively p* × 1 *and q* × 1 *real vectors whose components are distinct scalar variables. Consider the bilinear form X*- *AY and the quadratic forms X*- *BX and Y* - *CY where B* = *B*- *, C* = *C*- *, and B and C are nonsingular constant matrices. Then,*

$$\max\_{X'BX=1, Y'CY=1} [X'AY] = |\lambda\_1|$$

$$\min\_{X'BX=1, Y'CY=1} [X'AY] = |\lambda\_p|$$

*where λ*<sup>2</sup> <sup>1</sup> *is the largest root resulting from equation (iv) or (v) and <sup>λ</sup>*<sup>2</sup> *<sup>p</sup> is the smallest root resulting from equation (iv) or (v).*

Observe that if *p<q*, we may utilize equation *(v)* to solve for *λ*<sup>2</sup> and if *q<p*, then we may use equation *(iv)* to solve for *λ*2, and both will lead to the same solution. In the above derivation, we assumed that *B* and *C* are nonsingular. In Canonical Correlation Analysis, both *B* and *C* are real positive definite matrices corresponding to the variances *X*- *BX* and *Y* - *CY* of the linear forms and then, *X*- *AY* corresponds to covariance between these linear forms.

**Note 1.7.1.** We have confined ourselves to results in the real domain in this subsection since only real cases are discussed in connection with the applications that are considered in later chapters, such as Principal Component Analysis and Canonical Correlation Analysis. The corresponding complex cases do not appear to have practical applications. Accordingly, optimizations of Hermitian forms will not be discussed. However, parallel results to Theorems 1.7.1–1.7.7 could similarly be worked out in the complex domain.

#### **References**

A.M. Mathai (1997): *Jacobians of Matrix Transformations and Functions of Matrix Argument*, World Scientific Publishing, New York.

A.M. Mathai and Hans J. Haubold (2017a): *Linear Algebra: A Course for Physicists and Engineers*, De Gruyter, Germany.

A.M. Mathai and Hans J. Haubold (2017b): *Probability and Statistics: A Course for Physicists and Engineers*, De Gruyter, Germany.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 2 The Univariate Gaussian and Related Distributions**

#### **2.1. Introduction**

It is assumed that the reader has had adequate exposure to basic concepts in Probability, Statistics, Calculus and Linear Algebra. This chapter will serve as a review of the basic ideas about the univariate Gaussian, or normal, distribution as well as related distributions. We will begin with a discussion of the univariate Gaussian density. We will adopt the following notation: real scalar mathematical or random variables will be denoted by lower-case letters such as *x, y,z*, whereas vector or matrix-variate mathematical or random variables will be denoted by capital letters such as *X, Y, Z, . . .* . Statisticians usually employ the double notation *X* and *x* where it is claimed that *x* is a realization of *X*. Since *x* can vary, it is a variable in the mathematical sense. Treating mathematical and random variables the same way will simplify the notation and possibly reduce the confusion. Complex variables will be written with a tilde such as *x,*˜ *y,*˜ *X,* ˜ *Y*˜, etc. Constant scalars and matrices will be written without a tilde unless for stressing that the constant matrix is in the complex domain. In such a case, a tilde will be also be utilized for the constant. Constant matrices will be denoted by *A, B, C, . . .* .

The numbering will first indicate the chapter and then the section. For example, Eq. (2.1.9) will be the ninth equation in Sect. 2.1 of this chapter. Local numbering for sub-sections will be indicated as *(i)*, *(ii)*, and so on.

Let *x*<sup>1</sup> be a real univariate Gaussian, or normal, random variable whose parameters are *μ*<sup>1</sup> and *σ*<sup>2</sup> <sup>1</sup> ; this will be written as *<sup>x</sup>*<sup>1</sup> <sup>∼</sup> *<sup>N</sup>*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *)*, the associated density being given by

$$f(\mathbf{x}\_{\mathsf{l}}) = \frac{1}{\sigma\_1 \sqrt{2\pi}} \mathbf{e}^{-\frac{1}{2\sigma\_1^2}(\mathbf{x}\_{\mathsf{l}} - \mu\_{\mathsf{l}})^2}, \ -\infty < \mathbf{x}\_{\mathsf{l}} < \infty, \ -\infty < \mu\_{\mathsf{l}} < \infty, \ \sigma\_{\mathsf{l}} > 0.$$

In this instance, the subscript 1 in *N*1*(*·*)* refers to the univariate case. Incidentally, a density is a real-valued scalar function of *x* such that *f (x)* ≥ 0 for all *x* and # *<sup>x</sup> f (x)*d*x* = 1. The moment generating function (mgf) of this Gaussian random variable *x*1, with *t*<sup>1</sup> as

its parameter, is given by the following expected value, where *E*[·] denotes the expected value of [·]:

$$E[\mathbf{e}^{t\_{\parallel}\mathbf{x}\_{\parallel}}] = \int\_{-\infty}^{\infty} \mathbf{e}^{t\_{\parallel}\mathbf{x}\_{\parallel}} f(\mathbf{x}\_{\parallel}) d\mathbf{x}\_{\parallel} = \mathbf{e}^{t\_{\parallel}\mu\_{1} + \frac{1}{2}t\_{1}^{2}\sigma\_{1}^{2}}.\tag{2.1.1}$$

#### **2.1a. The Complex Scalar Gaussian Variable**

Let *<sup>x</sup>*˜ <sup>=</sup> *<sup>x</sup>*<sup>1</sup> <sup>+</sup>*ix*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*), x*1*, x*<sup>2</sup> real scalar variables. Let *<sup>E</sup>*[*x*1] = *<sup>μ</sup>*1*, E*[*x*2] = *<sup>μ</sup>*2*,* Var*(x*1*)* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> <sup>1</sup> *,* Var*(x*2*)* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> *,* Cov*(x*1*, x*2*)* = *σ*12. By definition, the variance of the complex random variable *x*˜ is defined as

$$\text{Var}(\tilde{\boldsymbol{x}}) = E[\tilde{\boldsymbol{x}} - E(\tilde{\boldsymbol{x}})][\tilde{\boldsymbol{x}} - E(\tilde{\boldsymbol{x}})]^\*$$

where \* indicates a conjugate transpose in general; in this case, it simply means the conjugate since *x*˜ is a scalar. Since *x*˜ − *E(x)*˜ = *x*<sup>1</sup> + *ix*<sup>2</sup> − *μ*<sup>1</sup> − *iμ*<sup>2</sup> = *(x*<sup>1</sup> − *μ*1*)* + *i(x*<sup>2</sup> − *μ*2*)* and [ ˜*x* − *E(x)*˜ ] <sup>∗</sup> = *(x*<sup>1</sup> − *μ*1*)* − *i(x*<sup>2</sup> − *μ*2*)*,

$$\begin{split} \text{Var}(\tilde{\mathbf{x}}) &= E[\tilde{\mathbf{x}} - E(\tilde{\mathbf{x}})][\tilde{\mathbf{x}} - E(\tilde{\mathbf{x}})]^\ast \\ &= E[(\mathbf{x}\_1 - \mu\_1) + i(\mathbf{x}\_2 - \mu\_2)][(\mathbf{x}\_1 - \mu\_1) - i(\mathbf{x}\_2 - \mu\_2)] = E[(\mathbf{x}\_1 - \mu\_1)^2 + (\mathbf{x}\_2 - \mu\_2)^2] \\ &= \sigma\_1^2 + \sigma\_2^2 \equiv \sigma^2. \end{split} \tag{i}$$

Observe that Cov*(x*1*, x*2*)* does not appear in the scalar case. However, the covariance will be present in the vector/matrix case as will be explained in the coming chapters. The complex Gaussian density is given by

$$f(\tilde{\boldsymbol{x}}) = \frac{1}{\pi \sigma^2} \mathbf{e}^{-\frac{1}{\sigma^2}(\tilde{\boldsymbol{x}} - \tilde{\mu})^\*(\tilde{\boldsymbol{x}} - \tilde{\mu})} \tag{ii}$$

for *<sup>x</sup>*˜ <sup>=</sup> *<sup>x</sup>*<sup>1</sup> <sup>+</sup> *ix*2*, <sup>μ</sup>*˜ <sup>=</sup> *<sup>μ</sup>*<sup>1</sup> <sup>+</sup> *iμ*2*,* −∞ *< xj <sup>&</sup>lt;* <sup>∞</sup>*,* −∞ *< μj <sup>&</sup>lt;* <sup>∞</sup>*, σ*<sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup>*, j* <sup>=</sup> <sup>1</sup>*,* 2. We will write this as *<sup>x</sup>*˜ <sup>∼</sup> *<sup>N</sup>*˜1*(μ, σ* ˜ <sup>2</sup>*)*. It can be shown that the two parameters appearing in the density in *(ii)* are the mean value of *x*˜ and the variance of *x*˜. We now establish that the density in *(ii)* is equivalent to a real bivariate Gaussian density with *σ*<sup>2</sup> <sup>1</sup> <sup>=</sup> <sup>1</sup> 2*σ*2*, σ*<sup>2</sup> <sup>2</sup> <sup>=</sup> <sup>1</sup> 2*σ*<sup>2</sup> and zero correlation. In the real bivariate normal density, the exponent is the following, with *Σ* as given below:

$$\begin{split} & -\frac{1}{2} [(\mathbf{x}\_{1} - \boldsymbol{\mu}\_{1}), (\mathbf{x}\_{2} - \boldsymbol{\mu}\_{2})] \Sigma^{-1} \begin{bmatrix} \mathbf{x}\_{1} - \boldsymbol{\mu}\_{1} \\ \mathbf{x}\_{2} - \boldsymbol{\mu}\_{2} \end{bmatrix}, \quad \Sigma = \begin{bmatrix} \frac{1}{2} \sigma^{2} & 0 \\ 0 & \frac{1}{2} \sigma^{2} \end{bmatrix} \\ & = -\frac{1}{\sigma^{2}} \{ (\mathbf{x}\_{1} - \boldsymbol{\mu}\_{1})^{2} + (\mathbf{x}\_{2} - \boldsymbol{\mu}\_{2})^{2} \} = -\frac{1}{\sigma^{2}} (\tilde{\mathbf{x}} - \tilde{\boldsymbol{\mu}})^{\*} (\tilde{\mathbf{x}} - \tilde{\boldsymbol{\mu}}). \end{split}$$

This exponent agrees with that appearing in the complex case. Now, consider the constant part in the real bivariate case:

$$(2\pi)|\Sigma|^{\frac{1}{2}} = (2\pi)\begin{vmatrix}\frac{1}{2}\sigma^2 & 0\\0 & \frac{1}{2}\sigma^2\end{vmatrix}^{\frac{1}{2}} = \pi\sigma^2,$$

which also coincides with that of the complex Gaussian. Hence, a complex scalar Gaussian is equivalent to a real bivariate Gaussian case whose parameters are as described above.

Let us consider the mgf of the complex Gaussian scalar case. Let *t* ˜ <sup>=</sup> *<sup>t</sup>*<sup>1</sup> <sup>+</sup> *it*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, with *<sup>t</sup>*<sup>1</sup> and *<sup>t</sup>*<sup>2</sup> being real parameters, so that *<sup>t</sup>* ˜∗ = ¯ *t* ˜ = *t*<sup>1</sup> − *it*<sup>2</sup> is the conjugate of *t* ˜. Then *t* ˜∗*x*˜ = *t*1*x*<sup>1</sup> + *t*2*x*<sup>2</sup> + *i(t*1*x*<sup>2</sup> − *t*2*x*1*)*. Note that *t*1*x*<sup>1</sup> + *t*2*x*<sup>2</sup> contains the necessary number of parameters (that is, 2) and hence, to be consistent with the definition of the mgf in a real bivariate case, the imaginary part should not be taken into account; thus, we should define the mgf as *<sup>E</sup>*[e*(t* ˜∗*x)*˜ ], where *(*·*)* denotes the real part of *(*·*)*. Accordingly, in the complex case, the mgf is obtained as follows:

$$\begin{split} M\_{\tilde{\mathfrak{x}}}(\tilde{t}) &= E[\mathbf{e}^{\mathfrak{R}(\tilde{t}^{\*}\tilde{\mathfrak{x}})}] \\ &= \frac{1}{\pi\sigma^{2}} \int\_{\tilde{\mathfrak{x}}} \mathbf{e}^{\mathfrak{R}(\tilde{t}^{\*}\tilde{\mathfrak{x}}) - \frac{1}{\sigma^{2}}(\tilde{\mathfrak{x}} - \tilde{\mathfrak{\mu}})^{\*}(\tilde{\mathfrak{x}} - \tilde{\mathfrak{\mu}})} \mathrm{d}\tilde{\mathfrak{x}} \\ &= \frac{\mathbf{e}^{\mathfrak{R}(\tilde{t}^{\*}\tilde{\mathfrak{\mu}})}}{\pi\sigma^{2}} \int\_{\tilde{\mathfrak{x}}} \mathbf{e}^{\mathfrak{R}(\tilde{t}^{\*}(\tilde{\mathfrak{x}} - \tilde{\mathfrak{\mu}})) - \frac{1}{\sigma^{2}}(\tilde{\mathfrak{x}} - \tilde{\mathfrak{\mu}})^{\*}(\tilde{\mathfrak{x}} - \tilde{\mathfrak{\mu}})} \mathrm{d}\tilde{\mathfrak{x}}. \end{split}$$

Let us simplify the exponent:

$$\begin{aligned} \Re(\tilde{t}^\*(\tilde{\mathbf{x}} - \tilde{\mu})) &- \frac{1}{\sigma^2} (\tilde{\mathbf{x}} - \tilde{\mu})^\* (\tilde{\mathbf{x}} - \tilde{\mu}) \\ &= - \{ \frac{1}{\sigma^2} (\mathbf{x}\_1 - \mu\_1)^2 + \frac{1}{\sigma^2} (\mathbf{x}\_2 - \mu\_2)^2 - t\_1 (\mathbf{x}\_1 - \mu\_1) - t\_2 (\mathbf{x}\_2 - \mu\_2) \} \\ &= \frac{\sigma^2}{4} (t\_1^2 + t\_2^2) - \{ (\mathbf{y}\_1 - \frac{\sigma}{2} t\_1)^2 + (\mathbf{y}\_2 - \frac{\sigma}{2} t\_2)^2 \} \end{aligned}$$

where *<sup>y</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*1−*μ*<sup>1</sup> *<sup>σ</sup> , y*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*2−*μ*<sup>2</sup> *<sup>σ</sup>* , d*yj* <sup>=</sup> <sup>1</sup> *<sup>σ</sup>* d*xj , j* = 1*,* 2. But

$$\frac{1}{\sqrt{\pi}} \int\_{-\infty}^{\infty} \mathbf{e}^{-\left(\mathbf{y}\_{j} - \frac{\sigma}{2}t\_{j}\right)^{2}} \mathbf{d} \mathbf{y}\_{j} = 1, \ \ j = 1, 2.$$

Hence,

$$M\_{\tilde{\mathbf{x}}}(\tilde{t}) = \mathbf{e}^{\Re(\tilde{t}^\*\tilde{\mu}) + \frac{\sigma^2}{4}\tilde{t}^\*\tilde{t}} = \mathbf{e}^{t\_1\mu\_1 + t\_2\mu\_2 + \frac{\sigma^2}{4}(t\_1^2 + t\_2^2)},\tag{2.1a.1}$$

which is the mgf of the equivalent real bivariate Gaussian distribution.

**Note 2.1.1.** A statistical density is invariably a real-valued scalar function of the variables involved, be they scalar, vector or matrix variables, real or complex.

#### **2.1.1. Linear functions of Gaussian variables in the real domain**

If *x*1*,...,xk* are statistically independently distributed real scalar Gaussian variables with parameters *(μj , σ*<sup>2</sup> *<sup>j</sup> ), j* = 1*,...,k* and if *a*1*,...,ak* are real scalar constants then the mgf of a linear function *u* = *a*1*x*<sup>1</sup> +···+ *akxk* is given by

$$\begin{split} M\_{\boldsymbol{u}}(t) &= E[\mathbf{e}^{t\boldsymbol{u}}] = E[\mathbf{e}^{t a\_1 \boldsymbol{x}\_1 + \cdots + t a\_k \boldsymbol{x}\_k}] = M\_{\boldsymbol{x}\_1}(a\_1 t) \cdots M\_{\boldsymbol{x}\_k}(a\_k t), \text{ as } M\_{\boldsymbol{a}\boldsymbol{x}}(t) = M\_{\boldsymbol{x}}(a t), \\ &= \mathbf{e}^{\{t a\_1 \mu\_1 + \cdots + t a\_k \mu\_k\} + \frac{1}{2} t^2 (a\_1^2 \sigma\_1^2 + \cdots + a\_k^2 \sigma\_k^2)} \\ &= \mathbf{e}^{t \left(\sum\_{j=1}^k a\_j \mu\_j\right) + \frac{1}{2} t^2 (\sum\_{j=1}^k a\_j^2 \sigma\_j^2)}, \end{split}$$

which is the mgf of a real normal random variable whose parameters are *( <sup>k</sup> <sup>j</sup>*=<sup>1</sup> *ajμj , <sup>k</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup> <sup>σ</sup>*<sup>2</sup> *<sup>j</sup> )*. Hence, the following result:

**Theorem 2.1.1.** *Let the real scalar random variable xj have a real univariate normal (Gaussian) distribution, that is, xj* <sup>∼</sup> *<sup>N</sup>*1*(μj , σ*<sup>2</sup> *<sup>j</sup> ), j* = 1*,...,k and let x*1*,...,xk be statistically independently distributed. Then, any linear function u* = *a*1*x*1+···+*akxk, where a*1*,...,ak are real constants, has a real normal distribution with mean value <sup>k</sup> <sup>j</sup>*=<sup>1</sup> *ajμj and variance <sup>k</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup> <sup>σ</sup>*<sup>2</sup> *<sup>j</sup> , that is, u* ∼ *N*1*( <sup>k</sup> <sup>j</sup>*=<sup>1</sup> *ajμj , <sup>k</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup> <sup>σ</sup>*<sup>2</sup> *j ).*

Vector/matrix notation enables one to express this result in a more convenient form. Let

$$L = \begin{bmatrix} a\_1 \\ \vdots \\ a\_k \end{bmatrix}, \; \mu = \begin{bmatrix} \mu\_1 \\ \vdots \\ \mu\_k \end{bmatrix}, \; X = \begin{bmatrix} x\_1 \\ \vdots \\ x\_k \end{bmatrix}, \; \Sigma = \begin{bmatrix} \sigma\_1^2 & 0 & \dots & 0 \\ 0 & \sigma\_2^2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma\_k^2 \end{bmatrix}.$$

Then denoting the transposes by primes, *u* = *L*- *X* = *X*- *L, E(u)* = *L*- *μ* = *μ*- *L*, and

$$\begin{aligned} \text{Var}(u) &= E[(u - E(u))(u - E(u))'] = L'E[(X - E(X))(X - E(X))']L \\ &= L'\text{Cov}(X)L = L'\Sigma L \end{aligned}$$

where, in this case, *Σ* is the diagonal matrix diag*(σ*<sup>2</sup> <sup>1</sup> *,...,σ*<sup>2</sup> *<sup>k</sup> )*. If *x*1*,...,xk* is a simple random sample from *x*1, that is, from the normal population specified by the density of *x*<sup>1</sup> or, equivalently, if *x*1*,...,xk* are iid (independently and identically distributed) random variables having as a common distribution that of *x*1, then *E(u)* = *μ*1*L*- *J* = *μ*1*J* - *L* and Var*(u)* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> 1*L*- *L* where *u* is the linear function defined in Theorem 2.1.1 and *J* - = *(*1*,* 1*,...,* 1*)* is a vector of unities.

**Example 2.1.1.** Let *x*<sup>1</sup> ∼ *N*1*(*−1*,* 1*)* and *x*<sup>2</sup> ∼ *N*1*(*2*,* 2*)* be independently distributed real normal variables. Determine the density of the linear function *u* = 5*x*<sup>1</sup> − 2*x*<sup>2</sup> + 7.

**Solution 2.1.1.** Since *u* is a linear function of independently distributed real scalar normal variables, it is real scalar normal whose parameters *E(u)* and Var*(u)* are

$$\begin{aligned} E(u) &= 5E(\mathbf{x}\_1) - 2E(\mathbf{x}\_2) + 7 = 5(-1) - 2(2) + 7 = -2, \\ \text{Var}(u) &= 25\text{Var}(\mathbf{x}\_1) + 4\text{Var}(\mathbf{x}\_2) + 0 = 25(1) + 4(2) = 33, \end{aligned}$$

the covariance being zero since *x*<sup>1</sup> and *x*<sup>2</sup> are independently distributed. Thus, *u* ∼ *N*1*(*−2*,* 33*)*.

#### **2.1a.1. Linear functions in the complex domain**

We can also look into the distribution of linear functions of independently distributed complex Gaussian variables. Let *a* be a constant and *x*˜ a complex random variable, where *a* may be real or complex. Then, from the definition of the variance in the complex domain, one has

$$\begin{aligned} \text{Var}(a\tilde{\mathbf{x}}) &= E[(a\tilde{\mathbf{x}} - E(a\tilde{\mathbf{x}}))(a\tilde{\mathbf{x}} - E(a\tilde{\mathbf{x}}))^\ast] = aE[(\tilde{\mathbf{x}} - E(\tilde{\mathbf{x}}))(\tilde{\mathbf{x}} - E(\tilde{\mathbf{x}}))^\ast]a^\ast \\ &= a\text{Var}(\tilde{\mathbf{x}})a^\ast = aa^\ast \text{Var}(\tilde{\mathbf{x}}) = |a|^2 \text{Var}(\tilde{\mathbf{x}}) = |a|^2 \sigma^2 \end{aligned}$$

when the variance of *<sup>x</sup>*˜ is *<sup>σ</sup>*2, where <sup>|</sup>*a*<sup>|</sup> denotes the absolute value of *<sup>a</sup>*. As well, *<sup>E</sup>*[*ax*˜] = *aE*[ ˜*x*] = *aμ*˜. Then, a companion to Theorem 2.1.1 is obtained.

**Theorem 2.1a.1.** *Let x*˜1*,..., x*˜*<sup>k</sup> be independently distributed scalar complex Gaussian variables, <sup>x</sup>*˜*<sup>j</sup>* <sup>∼</sup> *<sup>N</sup>*˜1*(μ*˜ *<sup>j</sup> , σ*<sup>2</sup> *<sup>j</sup> ), j* = 1*,...,k. Let a*1*,...,ak be real or complex constants and u*˜ = *a*1*x*˜1+···+*akx*˜*<sup>k</sup> be a linear function. Then, u*˜ *has a univariate complex Gaussian distribution given by u*˜ ∼ *N*˜1*( <sup>k</sup> <sup>j</sup>*=<sup>1</sup> *ajμ*˜ *<sup>j</sup> , <sup>k</sup> <sup>j</sup>*=<sup>1</sup> <sup>|</sup>*aj* <sup>|</sup> <sup>2</sup>*Var(x*˜*<sup>j</sup> )).*

**Example 2.1a.1.** Let *x*˜1*, x*˜2*, x*˜<sup>3</sup> be independently distributed complex Gaussian univariate random variables with expected values *μ*˜ <sup>1</sup> = −1 + 2*i, μ*˜ <sup>2</sup> = *i, μ*˜ <sup>3</sup> = −1 − *i* respectively. Let *x*˜*<sup>j</sup>* = *x*1*<sup>j</sup>* + *ix*2*<sup>j</sup> , j* = 1*,* 2*,* 3. Let [Var*(x*1*<sup>j</sup> ),* Var*(x*2*<sup>j</sup> )*]=[*(*1*,* 1*), (*1*,* 2*), (*2*,* 3*)*], respectively. Let *a*<sup>1</sup> = 1+*i, a*<sup>2</sup> = 2−3*i, a*<sup>3</sup> = 2+*i, a*<sup>4</sup> = 3+2*i*. Determine the density of the linear function *u*˜ = *a*1*x*˜<sup>1</sup> + *a*2*x*˜<sup>2</sup> + *a*3*x*˜<sup>3</sup> + *a*4.

#### **Solution 2.1a.1.**

$$\begin{aligned} E(\tilde{u}) &= a\_1 E(\tilde{x}\_1) + a\_2 E(\tilde{x}\_2) + a\_3 E(\tilde{x}\_3) + a\_4 \\ &= (1+i)(-1+2i) + (2-3i)(i) + (2+i)(-1-i) + (3+2i) \\ &= (-3+i) + (3+2i) + (-1-3i) + (3+2i) = 2+2i; \\ \text{Var}(\tilde{u}) &= |a\_1|^2 \text{Var}(\tilde{x}\_1) + |a\_2|^2 \text{Var}(\tilde{x}\_2) + |a\_3|^2 \text{Var}(\tilde{x}\_3) \end{aligned}$$

and the covariances are equal to zero since the variables are independently distributed. Note that *x*˜<sup>1</sup> = *x*<sup>11</sup> + *ix*<sup>21</sup> and hence, for example,

$$\begin{split} \text{Var}(\tilde{\mathbf{x}}\_{1}) &= E[(\tilde{\mathbf{x}}\_{1} - E(\tilde{\mathbf{x}}\_{1}))(\tilde{\mathbf{x}}\_{1} - E(\tilde{\mathbf{x}}\_{1}))^{\*}] \\ &= E\{ [(\mathbf{x}\_{11} - E(\mathbf{x}\_{11})) + i(\mathbf{x}\_{21} - E(\mathbf{x}\_{21}))][(\mathbf{x}\_{11} - E(\mathbf{x}\_{11})) - i(\mathbf{x}\_{21} - E(\mathbf{x}\_{21}))] \} \\ &= E[(\mathbf{x}\_{11} - E(\mathbf{x}\_{11}))^{2}] - (i)^{2}E[(\mathbf{x}\_{21} - E(\mathbf{x}\_{21}))^{2}] \\ &= \text{Var}(\mathbf{x}\_{11}) + \text{Var}(\mathbf{x}\_{21}) = 1 + 1 = 2. \end{split}$$

Similarly, Var*(x*˜2*)* = 1 + 2 = 3*,* Var*(x*˜3*)* = 2 + 3 = 5. Moreover, |*a*1| <sup>2</sup> <sup>=</sup> *(*1*)*<sup>2</sup> <sup>+</sup> *(*1*)*<sup>2</sup> <sup>=</sup> 2*,* |*a*2| <sup>2</sup> <sup>=</sup> *(*2*)*<sup>2</sup> <sup>+</sup> *(*3*)*<sup>2</sup> <sup>=</sup> <sup>13</sup>*,* <sup>|</sup>*a*3<sup>|</sup> <sup>2</sup> <sup>=</sup> *(*2*)*<sup>2</sup> <sup>+</sup> *(*1*)*<sup>2</sup> <sup>=</sup> 5. Accordingly, Var*(u)*˜ <sup>=</sup> <sup>2</sup>*(*2*)* <sup>+</sup> *(*13*)(*3*)* + *(*5*)(*5*)* = 68. Thus, *u*˜ ∼ *N*˜1*(*2 + 2*i,* 68*)*. Note that the constant *a*<sup>4</sup> only affects the mean value. Had *a*<sup>4</sup> been absent from *u*˜, its mean value would have been real and equal to −1.

#### **2.1.2. The chisquare distribution in the real domain**

Suppose that *x*<sup>1</sup> follows a real standard normal distribution, that is, *x*<sup>1</sup> ∼ *N*1*(*0*,* 1*)*, whose mean value is zero and variance, 1. What is then the density of *x*<sup>2</sup> <sup>1</sup> , the square of a real standard normal variable? Let the distribution function or cumulative distribution function of *<sup>x</sup>*<sup>1</sup> be *Fx*<sup>1</sup> *(t)* <sup>=</sup> *P r*{*x*<sup>1</sup> <sup>≤</sup> *<sup>t</sup>*} and that of *<sup>y</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>2</sup> <sup>1</sup> be *Fy*<sup>1</sup> *(t)* = *P r*{*y*<sup>1</sup> ≤ *t*}. Note that since *y*<sup>1</sup> *>* 0, *t* must be positive. Then,

$$F\_{\mathbb{Y}\_1}(t) = \Pr\{\mathbf{y}\_1 \le t\} = \Pr\{\mathbf{x}\_1^2 \le t\} = \Pr\{|\mathbf{x}\_1| \le \sqrt{t}\} = \Pr\{-\sqrt{t} \le \mathbf{x}\_1 \le \sqrt{t}\}$$

$$= \Pr\{\mathbf{x}\_1 \le \sqrt{t}\} - \Pr\{\mathbf{x}\_1 \le -\sqrt{t}\} = F\_{\mathbf{x}\_1}(\sqrt{t}) - F\_{\mathbf{x}\_1}(-\sqrt{t}).\tag{i}$$

Denoting the density of *y*<sup>1</sup> by *g(y*1*)*, this density at *y*<sup>1</sup> = *t* is available by differentiating the distribution function *Fy*<sup>1</sup> *(t)* with respect to *t*. As for the density of *x*1, which is the standard normal density, it can be obtained by differentiating *Fx*<sup>1</sup> *(* <sup>√</sup>*t)* with respect to <sup>√</sup>*t*. Thus, differentiating *(i)* throughout with respect to *t*, we have

$$\begin{split} |g(t)|\_{l=\mathbf{y}\_{1}} &= \left[\frac{\mathrm{d}}{\mathrm{d}t}F\_{\mathbf{x}\_{1}}(\sqrt{t}) - \frac{\mathrm{d}}{\mathrm{d}t}F\_{\mathbf{x}\_{1}}(-\sqrt{t})\right]\Big|\_{l=\mathbf{y}\_{1}} \\ &= \left[\frac{\mathrm{d}}{\mathrm{d}\sqrt{t}}F\_{\mathbf{x}\_{1}}(\sqrt{t})\frac{\mathrm{d}\sqrt{t}}{\mathrm{d}t} - \frac{\mathrm{d}}{\mathrm{d}\sqrt{t}}F\_{\mathbf{x}\_{1}}(-\sqrt{t})\frac{\mathrm{d}(-\sqrt{t})}{\mathrm{d}t}\right]\Big|\_{l=\mathbf{y}\_{1}} \\ &= \frac{1}{2}t^{\frac{1}{2}-1}\frac{1}{\sqrt{(2\pi)}}\mathrm{e}^{-\frac{1}{2}t}\Big|\_{l=\mathbf{y}\_{1}} + \frac{1}{2}t^{\frac{1}{2}-1}\frac{1}{\sqrt{(2\pi)}}\mathrm{e}^{-\frac{1}{2}t}\Big|\_{l=\mathbf{y}\_{1}} \\ &= \frac{1}{2^{\frac{1}{2}}\Gamma(1/2)}\mathrm{y}\_{1}^{\frac{1}{2}-1}\mathrm{e}^{-\frac{1}{2}\mathbf{y}\_{1}}, 0 \le \mathbf{y}\_{1} < \infty, \text{ with } \Gamma(1/2) = \sqrt{\pi}. \tag{ii} \end{split}$$

Accordingly, the density of *<sup>y</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>2</sup> <sup>1</sup> or the square of a real standard normal variable, is a two-parameter real gamma with *<sup>α</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> and *β* = 2 or a real chisquare with one degree of freedom. A two-parameter real gamma density with the parameters *(α, β)* is given by

$$f\_{\mathbf{l}}(\mathbf{y}\_{\mathbf{l}}) = \frac{1}{\beta^{\alpha}\Gamma(\alpha)} \mathbf{y}\_{\mathbf{l}}^{\alpha-1} \mathbf{e}^{-\frac{\mathbf{y}\_{\mathbf{l}}}{\beta}}, \ 0 \le \mathbf{y}\_{\mathbf{l}} < \infty, \ \alpha > 0, \ \beta > 0,\tag{2.1.2}$$

and *<sup>f</sup>*1*(y*1*)* <sup>=</sup> 0 elsewhere. When *<sup>α</sup>* <sup>=</sup> *<sup>n</sup>* <sup>2</sup> and *β* = 2, we have *a real chisquare density with n degrees of freedom*. Hence, the following result:

**Theorem 2.1.2.** *The square of a real scalar standard normal random variable is a real chisquare variable with one degree of freedom. A real chisquare with n degrees of freedom has the density given in* (2.1.2) *with <sup>α</sup>* <sup>=</sup> *<sup>n</sup>* <sup>2</sup> *and β* = 2*.*

A real scalar chisquare random variable with *m* degrees of freedom is denoted as *χ*<sup>2</sup> *m*. From (2.1.2), by computing the mgf we can see that the mgf of a real scalar gamma random variable *<sup>y</sup>* is *My(t)* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> *βt)*−*<sup>α</sup>* for 1 <sup>−</sup> *βt >* 0. Hence, a real chisquare with *<sup>m</sup>* degrees of freedom has the mgf *Mχ*<sup>2</sup> *<sup>m</sup>(t)* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t)*<sup>−</sup> *<sup>m</sup>* <sup>2</sup> for 1 − 2*t >* 0. The condition 1−*βt >* 0 is required for the convergence of the integral when evaluating the mgf of a real gamma random variable. If *yj* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *mj , j* = 1*,...,k* and if *y*1*,...,yk* are independently distributed, then the sum *<sup>y</sup>* <sup>=</sup> *<sup>y</sup>*1+· · ·+*yk* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *m*1+···+*mk* , a real chisquare with *m*1+· · ·+*mk* degrees of freedom, with mgf *My(t)* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t)*−<sup>1</sup> <sup>2</sup> *(m*1+···+*mk)* for 1 <sup>−</sup> <sup>2</sup>*t >* 0.

**Example 2.1.2.** Let *x*<sup>1</sup> ∼ *N*1*(*−1*,* 4*), x*<sup>2</sup> ∼ *N*1*(*2*,* 2*)* be independently distributed. Let *<sup>u</sup>* <sup>=</sup> *<sup>x</sup>*<sup>2</sup> <sup>1</sup> <sup>+</sup> <sup>2</sup>*x*<sup>2</sup> <sup>2</sup> + 2*x*<sup>1</sup> − 8*x*<sup>2</sup> + 5. Compute the density of *u*.

**Solution 2.1.2.**

$$\begin{aligned} \mu &= \mathbf{x}\_1^2 + 2\mathbf{x}\_2^2 + 2\mathbf{x}\_1 - 8\mathbf{x}\_2 + \mathbf{5} = \left(\mathbf{x}\_1 + 1\right)^2 + 2\left(\mathbf{x}\_2 - 2\right)^2 - 4 \\ &= 4\left[\frac{(\mathbf{x}\_1 + 1)^2}{4} + \frac{(\mathbf{x}\_2 - 2)^2}{2}\right] - 4. \end{aligned}$$

Since *<sup>x</sup>*<sup>1</sup> <sup>∼</sup> *<sup>N</sup>*1*(*−1*,* <sup>4</sup>*)* and *<sup>x</sup>*<sup>2</sup> <sup>∼</sup> *<sup>N</sup>*1*(*2*,* <sup>2</sup>*)* are independently distributed, so are *(x*1+1*)*<sup>2</sup> 4 ∼ *χ*2 <sup>1</sup> and *(x*2−2*)*<sup>2</sup> <sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> , and hence the sum is a real *<sup>χ</sup>*<sup>2</sup> <sup>2</sup> random variable. Then, *u* = 4*y* − 4 with *<sup>y</sup>* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> <sup>2</sup> . But the density of *y*, denoted by *fy(y)*, is

$$f\_{\mathbf{y}}(\mathbf{y}) = \frac{1}{2}\mathbf{e}^{-\frac{\mathbf{y}}{2}},\ \mathbf{0} \le \mathbf{y} < \infty,$$

and *fy(y)* = 0 elsewhere. Then, *z* = 4*y* has the density

$$f\_z(z) = \frac{1}{8} \mathbf{e}^{-\frac{\tau}{8}}, \ 0 \le z < \infty, \ \tau$$

and *fz(z)* = 0 elsewhere. However, since *u* = *z* − 4, its density is

$$f\_{\mu}(\mu) = \frac{1}{8} \mathbf{e}^{-\frac{(\mu+4)}{8}}, \ -4 \le \mu < \infty, \ \mathbf{e}$$

and zero elsewhere.

#### **2.1a.2. The chisquare distribution in the complex domain**

Let us consider the distribution of *z*˜1*z*˜<sup>∗</sup> <sup>1</sup> of a scalar standard complex normal variable *z*˜1. The density of *z*˜<sup>1</sup> is given by

$$f\_{\tilde{z}\_1}(\tilde{z}\_1) = \frac{1}{\pi} \mathbf{e}^{-\tilde{z}\_1^\* \tilde{z}\_1}, \ \tilde{z}\_1 = z\_{11} + i z\_{12}, \ -\infty < z\_{1j} < \infty, \ j = 1, 2.$$

Let *u*˜<sup>1</sup> = ˜*z*<sup>∗</sup> <sup>1</sup>*z*˜1. Note that *z*˜<sup>∗</sup> <sup>1</sup>*z*˜<sup>1</sup> is real and hence we may associate a real parameter *t* to the mgf. Note that *z*˜1*z*˜<sup>∗</sup> <sup>1</sup> in the scalar complex case corresponds to *<sup>z</sup>*<sup>2</sup> in the real scalar case where *z* ∼ *N*1*(*0*,* 1*)*. Then, the mgf of *u*˜<sup>1</sup> is given by

$$M\_{\tilde{\mu}\_1}(t) = E[\mathbf{e}^{\Re(\tilde{t}\tilde{\mu}\_1)}] = \frac{1}{\pi} \int\_{\tilde{z}\_1} \mathbf{e}^{-\Re[(1-t)\tilde{z}\_1^\*\tilde{z}\_1]} \mathbf{d}\tilde{z}\_1.$$

However, *z*˜<sup>∗</sup> <sup>1</sup>*z*˜<sup>1</sup> <sup>=</sup> *<sup>z</sup>*<sup>2</sup> <sup>11</sup> <sup>+</sup> *<sup>z</sup>*<sup>2</sup> <sup>12</sup> as *<sup>z</sup>*˜<sup>1</sup> <sup>=</sup> *<sup>z</sup>*<sup>11</sup> <sup>+</sup> *iz*12*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, where *<sup>z</sup>*<sup>11</sup> and *<sup>z</sup>*<sup>12</sup> are real. Thus, the above integral gives *(*<sup>1</sup> <sup>−</sup> *t)*−<sup>1</sup> <sup>2</sup> *(*<sup>1</sup> <sup>−</sup> *t)*−<sup>1</sup> <sup>2</sup> <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> *t)*−<sup>1</sup> for 1 <sup>−</sup> *t >* 0, which is the mgf of a real scalar gamma variable with parameters *α* = 1 and *β* = 1. Let *<sup>z</sup>*˜*<sup>j</sup>* <sup>∼</sup> *<sup>N</sup>*˜1*(μ*˜ *<sup>j</sup> , σ*2*), j* <sup>=</sup> <sup>1</sup>*,...,k*, be scalar complex normal random variables that are independently distributed. Letting

$$\tilde{\mu} = \sum\_{j=1}^{k} \left( \frac{\tilde{z}\_j - \tilde{\mu}\_j}{\sigma\_j} \right)^\* \left( \frac{\tilde{z}\_j - \tilde{\mu}\_j}{\sigma\_j} \right) \sim \text{ real scalar gamma with parameters } \alpha = k, \beta = 1, \ldots$$

whose density is

$$f\_{\tilde{\mu}}(u) = \frac{1}{\Gamma(k)} u^{k-1} e^{-u}, \ 0 \le u < \infty, \ k = 1, 2, \dots,\tag{2.1a.2}$$

*u*˜ is referred to as *a scalar chisquare in the complex domain having k degrees of freedom*, which is denoted *<sup>u</sup>*˜ ∼ ˜*χ*<sup>2</sup> *<sup>k</sup>* . Hence, the following result:

**Theorem 2.1a.2.** *Let <sup>z</sup>*˜*<sup>j</sup>* <sup>∼</sup> *<sup>N</sup>*˜1*(μ*˜ *<sup>j</sup> , σ*<sup>2</sup> *<sup>j</sup> ), j* = 1*,... , k, be independently distributed and <sup>u</sup>*˜ <sup>=</sup> *<sup>k</sup> <sup>j</sup>*=1*( z*˜*j*− ˜*μj σj )*∗*( z*˜*j*− ˜*μj σj ). Then u*˜ *is called a scalar chisquare having k degrees of freedom in the complex domain whose density as given in (2.1a.2) is that of a real scalar gamma random variable with parameters α* = *k and β* = 1*.*

**Example 2.1a.2.** Let *x*˜<sup>1</sup> ∼ *N*˜1*(i,* 2*), x*˜<sup>2</sup> ∼ *N*˜1*(*1 − *i,* 1*)* be independently distributed complex Gaussian univariate random variables. Let *u*˜ = ˜*x*<sup>∗</sup> <sup>1</sup> *x*˜<sup>1</sup> + 2*x*˜<sup>∗</sup> <sup>2</sup> *x*˜<sup>2</sup> − 2*x*˜<sup>∗</sup> <sup>2</sup> − 2*x*˜<sup>2</sup> + *i(x*˜<sup>1</sup> + 2*x*˜<sup>∗</sup> <sup>2</sup> *)* − *i(x*˜<sup>∗</sup> <sup>1</sup> + 2*x*˜2*)* + 5. Evaluate the density of *u*˜.

**Solution 2.1a.2.** Let us simplify *u*˜, keeping in mind the parameters in the densities of *x*˜<sup>1</sup> and *x*˜2. Since terms of the type *x*˜<sup>∗</sup> <sup>1</sup> *x*˜<sup>1</sup> and *x*˜<sup>∗</sup> <sup>2</sup> *x*˜<sup>2</sup> are present in *u*˜, we may simplify into factors involving *x*˜<sup>∗</sup> *<sup>j</sup>* and *x*˜*<sup>j</sup>* for *j* = 1*,* 2. From the density of *x*˜<sup>1</sup> we have

$$\frac{(\tilde{\boldsymbol{x}}\_1 - \boldsymbol{i})^\*(\tilde{\boldsymbol{x}}\_1 - \boldsymbol{i})}{2} \sim \tilde{\chi}\_1^2$$

where

$$(\tilde{\mathbf{x}}\_1 - \mathbf{i})^\*(\tilde{\mathbf{x}}\_1 - \mathbf{i}) = (\tilde{\mathbf{x}}\_1^\* + \mathbf{i})(\tilde{\mathbf{x}}\_1 - \mathbf{i}) = \tilde{\mathbf{x}}\_1^\*\tilde{\mathbf{x}}\_1 + \mathbf{i}\tilde{\mathbf{x}}\_1 - \mathbf{i}\tilde{\mathbf{x}}\_1^\* + \mathbf{1}. \tag{i}$$

After removing the elements in *(i)* from *u*˜, the remainder is

$$\begin{split} 2\tilde{x}\_2^\*\tilde{x}\_2 - 2\tilde{x}\_2^\* - 2\tilde{x}\_2 + 2i\tilde{x}\_2^\* - 2i\tilde{x}\_2 + 4 \\ = 2[(\tilde{x}\_2 - 1)^\*(\tilde{x}\_2 - 1) - i\tilde{x}\_2 + i\tilde{x}\_2^\* + 1] \\ = 2[(\tilde{x}\_2 - 1 + i)^\*(\tilde{x}\_2 - 1 + i)]. \end{split}$$

Accordingly,

$$\begin{aligned} \tilde{\mu} &= 2 \Big[ \frac{(\tilde{\chi}\_1 - i)^\*(\tilde{\chi}\_1 - i)}{2} + (\tilde{\chi}\_2 - 1 + i)^\*(\tilde{\chi}\_2 - 1 + i) \Big] \\ &= 2 \{ \tilde{\chi}\_1^2 + \tilde{\chi}\_1^2 \} = 2 \tilde{\chi}\_2^2 \end{aligned}$$

where *<sup>χ</sup>*˜ <sup>2</sup> <sup>2</sup> is a scalar chisquare of degree 2 in the complex domain or, equivalently, a real scalar gamma with parameters *(α* <sup>=</sup> <sup>2</sup>*, β* <sup>=</sup> <sup>1</sup>*)*. Letting *<sup>y</sup>* = ˜*χ*<sup>2</sup> <sup>2</sup> , the density of *y*, denoted by *fy(y)*, is

$$f\_{\mathbf{y}}(\mathbf{y}) = \mathbf{y} \,\mathrm{e}^{-\mathbf{y}},\ \mathbf{0} \le \mathbf{y} < \infty, \mathbf{y}$$

and *fy(y)* = 0 elsewhere. Then, the density of *u* = 2*y*, denoted by *fu(u)*, which is given by

$$f\_{\mu}(\mu) = \frac{\mu}{2} \operatorname{e}^{-\frac{\mu}{2}}, \ 0 \le \mu < \infty,$$

and *fu(u)* = 0 elsewhere, is that of a real scalar gamma with the parameters *(α* = 2*, β* = 2*)*.

#### **2.1.3. The type-2 beta and** *F* **distributions in the real domain**

What about the distribution of the ratio of two independently distributed real scalar chisquare random variables? Let *<sup>y</sup>*<sup>1</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>m</sup>* and *<sup>y</sup>*<sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>n</sup>* , that is, *y*<sup>1</sup> and *y*<sup>2</sup> are real chisquare random variables with *m* and *n* degrees of freedom respectively, and assume that *y*<sup>1</sup> and *y*<sup>2</sup> are independently distributed. Let us determine the density of *u* = *y*1*/y*2. Let *v* = *y*<sup>2</sup> and consider the transformation *(y*1*, y*2*)* onto *(u, v)*. Noting that

$$
\frac{\partial u}{\partial \mathbf{y}\_1} = \frac{1}{\mathbf{y}\_2}, \ \frac{\partial v}{\partial \mathbf{y}\_2} = 1, \ \frac{\partial v}{\partial \mathbf{y}\_1} = 0,
$$

one has

$$\begin{aligned} \mathbf{d}\mu \wedge \mathbf{d}v &= \begin{vmatrix} \frac{\partial u}{\partial \mathbf{y}\_1} & \frac{\partial u}{\partial \mathbf{y}\_2} \\ \frac{\partial v}{\partial \mathbf{y}\_1} & \frac{\partial v}{\partial \mathbf{y}\_2} \end{vmatrix} \mathbf{d}\mathbf{y}\_1 \wedge \mathbf{d}\mathbf{y}\_2 = \begin{vmatrix} \frac{1}{\mathcal{Y}\_2} & \* \\ 0 & 1 \end{vmatrix} \mathbf{d}\mathbf{y}\_1 \wedge \mathbf{d}\mathbf{y}\_2 \\ &= \frac{1}{\mathcal{Y}\_2} \mathbf{d}\mathbf{y}\_1 \wedge \mathbf{d}\mathbf{y}\_2 \Rightarrow \mathbf{d}\mathbf{y}\_1 \wedge \mathbf{d}\mathbf{y}\_2 = v \,\mathbf{d}u \wedge \mathbf{d}v \end{aligned}$$

where the asterisk indicates the presence of some element in which we are not interested owing to the triangular pattern for the Jacobian matrix. Letting the joint density of *y*<sup>1</sup> and *y*<sup>2</sup> be denoted by *f*12*(y*1*, y*2*)*, one has

$$f\_{12}(\mathbf{y}\_1, \mathbf{y}\_2) = \frac{1}{2^{\frac{m+n}{2}} \Gamma(\frac{m}{2}) \Gamma(\frac{n}{2})} \mathbf{y}\_1^{\frac{m}{2}-1} \mathbf{y}\_2^{\frac{n}{2}-1} \mathbf{e}^{-\frac{\mathbf{y}\_1+\mathbf{y}\_2}{2}}$$

for 0 ≤ *y*<sup>1</sup> *<* ∞*,* 0 ≤ *y*<sup>2</sup> *<* ∞*, m, n* = 1*,* 2*,...*, and *f*12*(y*1*, y*2*)* = 0 elsewhere. Let the joint density of *u* and *v* be denoted by *g*12*(u, v)* and the marginal density of *u* be denoted by *g*1*(u)*. Then,

$$g\_{12}(u,v) = c \ \upsilon(uv)^{\frac{m}{2}-1} v^{\frac{n}{2}-1} \mathbf{e}^{-\frac{1}{2}(uv+v)}, \ c = \frac{1}{2^{\frac{m+n}{2}} \Gamma(\frac{m}{2}) \Gamma(\frac{n}{2})},$$

$$g\_1(u) = c \ u^{\frac{m}{2}-1} \int\_{v=0}^{\infty} v^{\frac{m+n}{2}-1} \mathbf{e}^{-v\frac{(1+u)}{2}} \mathbf{d}v$$

$$= c \ u^{\frac{m}{2}-1} \Gamma(\frac{m+n}{2}) \left(\frac{1+u}{2}\right)^{-\frac{m+n}{2}}$$

$$= \frac{\Gamma(\frac{m+n}{2})}{\Gamma(\frac{m}{2}) \Gamma(\frac{n}{2})} u^{\frac{m}{2}-1} (1+u)^{-\frac{m+n}{2}} \tag{2.1.3}$$

for *m, n* = 1*,* 2*,...,* 0 ≤ *u <* ∞ and *g*1*(u)* = 0 elsewhere. Note that *g*1*(u)* is a type-2 real scalar beta density. Hence, we have the following result:

**Theorem 2.1.3.** *Let the real scalar <sup>y</sup>*<sup>1</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>m</sup> and <sup>y</sup>*<sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>n</sup> be independently distributed, then the ratio <sup>u</sup>* <sup>=</sup> *<sup>y</sup>*<sup>1</sup> *<sup>y</sup>*<sup>2</sup> *is a type-2 real scalar beta random variable with the parameters <sup>m</sup>* 2 *and <sup>n</sup>* <sup>2</sup> *where m, n* = 1*,* 2*,..., whose density is provided in* (2.1.3)*.*

This result also holds for general real scalar gamma random variables *x*<sup>1</sup> *>* 0 and *x*<sup>2</sup> *>* 0 with parameters *(α*1*,β)* and *(α*2*,β)*, respectively, where *β* is a common scale parameter and it is assumed that *<sup>x</sup>*<sup>1</sup> and *<sup>x</sup>*<sup>2</sup> are independently distributed. Then, *<sup>u</sup>* <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *x*2 is a type-2 beta with parameters *α*<sup>1</sup> and *α*2.

If *u* as defined in Theorem 2.1.3 is replaced by *<sup>m</sup> <sup>n</sup> Fm,n* or *<sup>F</sup>* <sup>=</sup> *Fm,n* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> *m/m χ*2 *<sup>n</sup> /n* <sup>=</sup> *<sup>n</sup> mu* is known as the *F*-random variable with *m* and *n* degrees of freedom, where the degrees of freedom indicate those of the numerator and denominator chisquare random variables which are independently distributed. Denoting the density of *F* by *fF (F )* we have the following result:

**Theorem 2.1.4.** *Letting <sup>F</sup>* <sup>=</sup> *Fm,n* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> *m/m χ*2 *<sup>n</sup> /n where the two real scalar chisquares are independently distributed, the real scalar F-density is given by*

$$f\_F(F) = \frac{\Gamma(\frac{m+n}{2})}{\Gamma(\frac{m}{2})\Gamma(\frac{n}{2})} \left(\frac{m}{n}\right)^{\frac{m}{2}} \frac{F^{\frac{m}{2}-1}}{(1+\frac{m}{n}F)^{\frac{m+n}{2}}} \tag{2.1.4}$$

*for* 0 ≤ *F <* ∞*, m, n* = 1*,* 2*,..., and fF (F )* = 0 *elsewhere.*

**Example 2.1.3.** Let *x*<sup>1</sup> and *x*<sup>2</sup> be independently distributed real scalar gamma random variables with parameters *(α*1*,β)* and *(α*2*,β)*, respectively, *β* being a common parameter, whose densities are as specified in (2.1.2). Let *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *x*1+*x*<sup>2</sup> *, u*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *x*2 *, u*<sup>3</sup> = *x*<sup>1</sup> + *x*2. Show that (1): *u*<sup>3</sup> has a real scalar gamma density as given in (2.1.2) with the parameters *(α*<sup>1</sup> + *α*2*,β)*; (2): *u*<sup>1</sup> and *u*<sup>3</sup> as well as *u*<sup>2</sup> and *u*<sup>3</sup> are independently distributed; (3): *u*<sup>2</sup> is a real scalar type-2 beta with parameters *(α*1*, α*2*)* whose density is specified in (2.1.3); (4): *u*<sup>1</sup> has a real scalar type-1 beta density given as

$$f\_1(u\_1) = \frac{\Gamma(\alpha\_1 + \alpha\_2)}{\Gamma(\alpha\_1)\Gamma(\alpha\_2)} u\_1^{\alpha\_1 - 1} (1 - u\_1)^{\alpha\_2 - 1}, \ 0 \le u\_1 \le 1,\tag{2.1.5}$$

for *(α*1*) >* 0*, (α*2*) >* 0 and zero elsewhere. [In a statistical density, the parameters are usually real; however, since the integrals exist for complex parameters, the conditions are given for complex parameters as the real parts of *α*<sup>1</sup> and *α*2, which must be positive. When they are real, the conditions will be simply *α*<sup>1</sup> *>* 0 and *α*<sup>2</sup> *>* 0.]

**Solution 2.1.3.** Since *x*<sup>1</sup> and *x*<sup>2</sup> are independently distributed, their joint density is the product of the marginal densities, which is given by

$$f\_{12}(\mathbf{x}\_1, \mathbf{x}\_2) = c \ x\_1^{a\_1 - 1} x\_2^{a\_2 - 1} \mathbf{e}^{-\frac{1}{\beta}(\mathbf{x}\_1 + \mathbf{x}\_2)}, \ 0 \le \mathbf{x}\_j < \infty, \ j = 1, 2,\tag{i}$$

for *(αj ) >* 0*, (β) >* 0*, j* = 1*,* 2 and zero elsewhere, where

$$c = \frac{1}{\beta^{\alpha\_1 + \alpha\_2} \varGamma(\alpha\_1) \varGamma(\alpha\_2)}.$$

Since the sum *x*<sup>1</sup> + *x*<sup>2</sup> is present in the exponent and both *x*<sup>1</sup> and *x*<sup>2</sup> are positive, a convenient transformation is *<sup>x</sup>*<sup>1</sup> <sup>=</sup> *<sup>r</sup>* cos<sup>2</sup> *θ, x*<sup>2</sup> <sup>=</sup> *<sup>r</sup>* sin<sup>2</sup> *θ ,* <sup>0</sup> <sup>≤</sup> *r <* <sup>∞</sup>*,* <sup>0</sup> <sup>≤</sup> *<sup>θ</sup>* <sup>≤</sup> *<sup>π</sup>* <sup>2</sup> . Then, the Jacobian is available from the detailed derivation of Jacobian given in the beginning of Sect. 2.1.3 or from Example 1.6.1. That is,

$$\mathbf{dx}\_1 \wedge \mathbf{dx}\_2 = 2r \, \sin \theta \, \cos \theta \, \mathbf{dr} \wedge \mathbf{d}\theta \, . \tag{ii}$$

Then from *(i)* and *(ii),* the joint density of *r* and *θ*, denoted by *fr,θ (r, θ )*, is the following:

$$f\_{r, \theta}(r, \theta) = c \left(\cos^2 \theta\right)^{a\_1 - 1} (\sin^2 \theta)^{a\_2 - 1} 2 \cos \theta \sin \theta \; r^{a\_1 + a\_2 - 1} \mathbf{e}^{-\frac{1}{\theta}r} \tag{iiii}$$

and zero elsewhere. As *fr,θ (r, θ )* is a product of positive integrable functions involving solely *<sup>r</sup>* and *<sup>θ</sup>*, *<sup>r</sup>* and *<sup>θ</sup>* are independently distributed. Since *<sup>u</sup>*<sup>3</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> <sup>+</sup> *<sup>x</sup>*<sup>2</sup> <sup>=</sup> *<sup>r</sup>* cos<sup>2</sup> *<sup>θ</sup>* <sup>+</sup> *<sup>r</sup>* sin2 *<sup>θ</sup>* <sup>=</sup> *<sup>r</sup>* is solely a function of *<sup>r</sup>* and *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *<sup>x</sup>*1+*x*<sup>2</sup> <sup>=</sup> cos<sup>2</sup> *<sup>θ</sup>* and *<sup>u</sup>*<sup>2</sup> <sup>=</sup> cos<sup>2</sup> *<sup>θ</sup>* sin2 *<sup>θ</sup>* are solely functions of *θ*, it follows that *u*<sup>1</sup> and *u*<sup>3</sup> as well as *u*<sup>2</sup> and *u*<sup>3</sup> are independently distributed. From *(iii)*, upon multiplying and dividing by *Γ (α*<sup>1</sup> + *α*2*)*, we obtain the density of *u*<sup>3</sup> as

$$f\_1(\mu\_3) = \frac{1}{\beta^{\alpha\_1 + \alpha\_2} \Gamma(\alpha\_1 + \alpha\_2)} \mu\_3^{\alpha\_1 + \alpha\_2 - 1} \mathbf{e}^{-\frac{\mu\_3}{\beta}}, \ 0 \le \mu\_3 < \infty,\tag{\text{iv}}$$

and zero elsewhere, which is a real scalar gamma density with parameters *(α*<sup>1</sup> + *α*2*,β)*. From *(iii)*, the density of *θ*, denoted by *f*2*(θ )*, is

$$f\_2(\theta) = c \ (\cos^2 \theta)^{a\_1 - 1} (\sin^2 \theta)^{a\_2 - 1}, \ 0 \le \theta \le \frac{\pi}{2} \tag{v}$$

and zero elsewhere, for *(αj ) >* 0*, j* = 1*,* 2*,*. From this result, we can obtain the density of *<sup>u</sup>*<sup>1</sup> <sup>=</sup> cos<sup>2</sup> *<sup>θ</sup>*. Then, d*u*<sup>1</sup> = −2 cos *<sup>θ</sup>* sin *<sup>θ</sup>* <sup>d</sup>*θ*. Moreover, when *<sup>θ</sup>* <sup>→</sup> <sup>0</sup>*, u*<sup>1</sup> <sup>→</sup> 1 and when *<sup>θ</sup>* <sup>→</sup> *<sup>π</sup>* <sup>2</sup> *, u*<sup>1</sup> → 0. Hence, the minus sign in the Jacobian is needed to obtain the limits in the natural order, 0 ≤ *u*<sup>1</sup> ≤ 1. Substituting in *(v)*, the density of *u*<sup>1</sup> denoted by *f*3*(u*1*)*, is as given in (2.1.5), *u*<sup>1</sup> being a real scalar type-1 beta random variable with parameters *(α*1*, α*2*)*. Now, observe that

$$
\mu\_2 = \frac{\cos^2 \theta}{\sin^2 \theta} = \frac{\cos^2 \theta}{1 - \cos^2 \theta} = \frac{u\_1}{1 - u\_1}.\tag{\text{vi}}
$$

Given the density of *u*<sup>1</sup> as specified in (2.1.5), we can obtain the density of *u*<sup>2</sup> as follows. As *<sup>u</sup>*<sup>2</sup> <sup>=</sup> *<sup>u</sup>*<sup>1</sup> 1−*u*<sup>1</sup> , we have *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>u</sup>*<sup>2</sup> 1+*u*<sup>2</sup> <sup>⇒</sup> <sup>d</sup>*u*<sup>1</sup> <sup>=</sup> <sup>1</sup> *(*1+*u*2*)*<sup>2</sup> <sup>d</sup>*u*2; then substituting these values in the density of *u*1, we have the following density for *u*2:

$$f\_{\mathfrak{I}}(\mu\_2) = \frac{\Gamma(\alpha\_1 + \alpha\_2)}{\Gamma(\alpha\_1)\Gamma(\alpha\_2)} \mu\_2^{\alpha\_1 - 1} (1 + \mu\_2)^{-(\alpha\_1 + \alpha\_2)}, \ 0 \le \mu\_2 < \infty,\tag{2.1.6}$$

and zero elsewhere, for *(αj ) >* 0*, j* = 1*,* 2, which is a real scalar type-2 beta density with parameters *(α*1*, α*2*)*. The results associated with the densities (2.1.5) and (2.1.6) are now stated as a theorem.

**Theorem 2.1.5.** *Let x*<sup>1</sup> *and x*<sup>2</sup> *be independently distributed real scalar gamma random variables with the parameters (α*1*, β), (α*2*,β), respectively, β being a common scale parameter. [If <sup>x</sup>*<sup>1</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>m</sup> and x*<sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>n</sup> , then <sup>α</sup>*<sup>1</sup> <sup>=</sup> *<sup>m</sup>* <sup>2</sup> *, α*<sup>2</sup> <sup>=</sup> *<sup>n</sup>* <sup>2</sup> *and <sup>β</sup>* <sup>=</sup> <sup>2</sup>*.] Then <sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *x*1+*x*<sup>2</sup> *is a real scalar type-1 beta whose density is as specified in* (2.1.5) *with the parameters (α*1*, α*2*), and <sup>u</sup>*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *<sup>x</sup>*<sup>2</sup> *is a real scalar type-2 beta whose density is as given in* (2.1.6) *with the parameters (α*1*, α*2*).*

#### **2.1a.3. The type-2 beta and** *F* **distributions in the complex domain**

It follows that in the complex domain, if *<sup>χ</sup>*˜ <sup>2</sup> *<sup>m</sup>* and *<sup>χ</sup>*˜ <sup>2</sup> *<sup>n</sup>* are independently distributed, then the sum is a chisquare with *<sup>m</sup>* <sup>+</sup> *<sup>n</sup>* degrees of freedom, that is, *<sup>χ</sup>*˜ <sup>2</sup> *<sup>m</sup>* + ˜*χ*<sup>2</sup> *<sup>n</sup>* = ˜*χ*<sup>2</sup> *<sup>m</sup>*+*n.* We now look into type-2 beta variables and *F*-variables and their connection to chisquare variables in the complex domain. Since, in the complex domain, the chisquares are actually real variables, the density of the ratio of two independently distributed chisquares with *m* and *n* degrees of freedom in the complex domain, remains the same as the density given in (2.1.3) with *<sup>m</sup>* <sup>2</sup> and *<sup>n</sup>* <sup>2</sup> replaced by *<sup>m</sup>* and *<sup>n</sup>*, respectively. Thus, letting *<sup>u</sup>*˜ = ˜*χ*<sup>2</sup> *m/χ*˜ <sup>2</sup> *<sup>n</sup>* where the two chisquares in the complex domain are independently distributed, the density of *u*˜, denoted by *g*˜1*(u)*, is

$$\tilde{g}\_1(\mu) = \frac{\Gamma(m+n)}{\Gamma(m)\Gamma(n)} \mu^{m-1} (1+\mu)^{-(m+n)} \tag{2.1a.3}$$

for 0 ≤ *u <* ∞*, m, n* = 1*,* 2*,...,* and *g*˜1*(u)* = 0 elsewhere.

**Theorem 2.1a.3.** *Let <sup>y</sup>*˜<sup>1</sup> ∼ ˜*χ*<sup>2</sup> *<sup>m</sup> and <sup>y</sup>*˜<sup>2</sup> ∼ ˜*χ*<sup>2</sup> *<sup>n</sup> be independently distributed where y*˜<sup>1</sup> *and <sup>y</sup>*˜<sup>2</sup> *are in the complex domain; then, <sup>u</sup>*˜ <sup>=</sup> *<sup>y</sup>*˜<sup>1</sup> *<sup>y</sup>*˜<sup>2</sup> *is a real type-2 beta whose density is given in (2.1a.3).*

If the *F* random variable in the complex domain is defined as *F*˜ *m,n* <sup>=</sup> *<sup>χ</sup>*˜ <sup>2</sup> *m/m χ*˜ 2 *<sup>n</sup> /n* where the two chisquares in the complex domain are independently distributed, then the density of *F*˜ is that of the real *F*-density with *m* and *n* replaced by 2*m* and 2*n* in (2.1.4), respectively.

**Theorem 2.1a.4.** *Let F*˜ = *F*˜ *m,n* <sup>=</sup> *<sup>χ</sup>*˜ <sup>2</sup> *m/m χ*˜ 2 *<sup>n</sup> /n where the two chisquares in the complex domain are independently distributed; then, F*˜ *is referred to as an F random variable in the complex domain and it has a real F-density with the parameters m and n, which is given by*

$$\tilde{\text{g}}\_2(F) = \frac{\Gamma(m+n)}{\Gamma(m)\Gamma(n)} \left(\frac{m}{n}\right)^m F^{m-1} (1 + \frac{m}{n}F)^{-(m+n)} \tag{2.1a.4}$$

*for* 0 ≤ *F <* ∞*, m, n* = 1*,* 2*,..., and g*˜2*(F )* = 0 *elsewhere.*

A type-1 beta representation in the complex domain can similarly be obtained from Theorem 2.1a.3. This will be stated as a theorem.

**Theorem 2.1a.5.** *Let <sup>x</sup>*˜<sup>1</sup> ∼ ˜*χ*<sup>2</sup> *<sup>m</sup> and <sup>x</sup>*˜<sup>2</sup> ∼ ˜*χ*<sup>2</sup> *<sup>n</sup> be independently distributed scalar chisquare variables in the complex domain with m and n degrees of freedom, respectively. Let <sup>u</sup>*˜<sup>1</sup> <sup>=</sup> *<sup>x</sup>*˜<sup>1</sup> *x*˜1+ ˜*x*<sup>2</sup> *, which is a real variable that we will call u*1*. Then, u*˜<sup>1</sup> *is a scalar type-1 beta random variable in the complex domain with the parameters m, n, whose real scalar density is*

$$\tilde{f}\_{\mathbf{l}}(\tilde{u}\_{\mathbf{l}}) = \frac{\Gamma(m+n)}{\Gamma(m)\Gamma(n)} u\_{\mathbf{l}}^{m-1} (1 - u\_{\mathbf{l}})^{n-1}, \ 0 \le u\_{\mathbf{l}} \le 1,\tag{2.1a.5}$$

*and zero elsewhere, for m, n* = 1*,* 2*,... .*

#### **2.1.4. Power transformation of type-1 and type-2 beta random variables**

Let us make a power transformation of the type *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *ayδ,a>* <sup>0</sup>*,δ>* 0. Then, <sup>d</sup>*u*<sup>1</sup> <sup>=</sup> *aδyδ*−1d*y*. For convenience, let the parameters in (2.1.5) be *<sup>α</sup>*<sup>1</sup> <sup>=</sup> *<sup>α</sup>* and *<sup>α</sup>*<sup>2</sup> <sup>=</sup> *<sup>β</sup>*. Then, the density given in (2.1.5) becomes

$$f\_{11}(\mathbf{y}) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \delta a^{\alpha} \mathbf{y}^{a\delta - 1} (1 - a\mathbf{y}^{\delta})^{\beta - 1}, \ 0 \le \mathbf{y} \le \frac{1}{a^{\frac{1}{\delta}}},\tag{2.1.7}$$

and zero elsewhere, for *a >* 0*,δ>* 0*, (α) >* 0*, (β) >* 0. We can extend the support to <sup>−</sup>*a*−<sup>1</sup> *<sup>δ</sup>* <sup>≤</sup> *<sup>y</sup>* <sup>≤</sup> *<sup>a</sup>*−<sup>1</sup> *<sup>δ</sup>* by replacing *<sup>y</sup>* by <sup>|</sup>*y*<sup>|</sup> and multiplying the normalizing constant by <sup>1</sup> 2 . Such power transformed models are useful in practical applications. Observe that a power transformation has the following effect: for *y <* 1, the density is reduced if *δ >* 1 or raised if *δ <* 1, whereas for *y >* 1, the density increases if *δ >* 1 or diminishes if *δ <* 1. For instance, the particular case *α* = 1 is highly useful in reliability theory and stress-strength analysis. Thus, letting *α* = 1 in the original real scalar type-1 beta density (2.1.7) and denoting the resulting density by *f*12*(y)*, one has

$$f\_{12}(\mathbf{y}) = a \delta \beta \mathbf{y}^{\delta - 1} (1 - a \mathbf{y}^{\delta})^{\beta - 1}, \ 0 \le \mathbf{y} \le a^{-\frac{1}{\delta}},\tag{2.1.8}$$

for *a >* 0*,δ>* 0*, (β) >* 0, and zero elsewhere. In the model in (2.1.8), the reliability, that is, *P r*{*y* ≥ *t*}, for some *t*, can be easily determined. As well, the hazard function *f*12*(y*=*t) P r*{*y*≥*t*} is readily available. Actually, the reliability or survival function is

$$\Pr\{\mathbf{y} \ge t\} = (1 - at^\delta)^\beta, \ a > 0, \ \delta > 0, \ t > 0, \ \beta > 0,\tag{i}$$

and the hazard function is

$$\frac{f\_{12}(\mathbf{y} = t)}{Pr\{\mathbf{y} \ge t\}} = \frac{a\delta\beta t^{\delta - 1}}{1 - at^{\delta}}.\tag{ii}$$

Observe that the free parameters *a, δ* and *β* allow for much versatility in model building situations. If *β* = 1 in the real scalar type-1 beta model in (2.1.7), then the density reduces to *αyα*−1*,* <sup>0</sup> <sup>≤</sup> *<sup>y</sup>* <sup>≤</sup> <sup>1</sup>*,α>* 0, which is a simple power function. The most popular power function model in the statistical literature is the Weibull model, which is a power transformed exponential density. Consider the real scalar exponential density

$$\mathbf{g}(\mathbf{x}) = \theta \mathbf{e}^{-\theta \mathbf{x}}, \ \theta > 0, \ \mathbf{x} \ge 0,\tag{iiii}$$

and zero elsewhere, and let *<sup>x</sup>* <sup>=</sup> *<sup>y</sup>δ,δ>* 0. Then the model in *(iii)* becomes the real scalar Weibull density, denoted by *g*1*(y)*:

$$\mathbf{g}\_1(\mathbf{y}) = \theta \delta \mathbf{y}^{\delta - 1} \mathbf{e}^{-\theta \mathbf{y}^{\delta}}, \ \theta > 0, \ \delta > 0, \ \mathbf{y} \ge 0,\tag{\text{iv}}$$

and zero elsewhere.

Now, let us consider power transformations in a real scalar type-2 beta density given in (2.1.6). For convenience let *<sup>α</sup>*<sup>1</sup> <sup>=</sup> *<sup>α</sup>* and *<sup>α</sup>*<sup>2</sup> <sup>=</sup> *<sup>β</sup>*. Letting *<sup>y</sup>*<sup>2</sup> <sup>=</sup> *ayδ,a>* <sup>0</sup>*,δ>* 0, the model specified by (2.1.6) then becomes

$$f\_{21}(\mathbf{y}) = a^{\alpha} \delta \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \mathbf{y}^{a\delta - 1} (1 + a\mathbf{y}^{\delta})^{-(\alpha + \beta)} \tag{\text{v}}$$

for *a >* 0*,δ>* 0*, (α) >* 0*, (β) >* 0, and zero elsewhere. As in the type-1 beta case, the most interesting special case occurs when *α* = 1. Denoting the resulting density by *f*22*(y)*, we have

$$f\_{22}(\mathbf{y}) = a \delta \beta \mathbf{y}^{\delta - 1} (1 + a \mathbf{y}^{\delta})^{-(\beta + 1)}, \ 0 \le \mathbf{y} < \infty,\tag{2.1.9}$$

for *a >* 0*,δ>* 0*, (β) >* 0*, α* = 1, and zero elsewhere. In this case as well, the reliability and hazard functions can easily be determined:

$$\text{Reliability function } = Pr\{\mathbf{y} \ge t\} = (1 + at^\delta)^{-\beta}, \tag{\text{vi}}$$

$$\text{Hazard function } = \frac{f\_{22}(\mathbf{y} = t)}{Pr\{\mathbf{y} \ge t\}} = \frac{a\delta\beta t^{\delta - 1}}{1 + at^{\delta}}.\tag{\text{vii}}$$

Again, for application purposes, the forms in *(vi)* and *(vii)* are seen to be very versatile due to the presence of the free parameters *a, δ* and *β*.

#### **2.1.5. Exponentiation of real scalar type-1 and type-2 beta variables**

Let us consider the real scalar type-1 beta model in (2.1.5) where, for convenience, we let *<sup>α</sup>*<sup>1</sup> <sup>=</sup> *<sup>α</sup>* and *<sup>α</sup>*<sup>2</sup> <sup>=</sup> *<sup>β</sup>*. Letting *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>a</sup>*e−*by*, we denote the resulting density by *<sup>f</sup>*13*(y)* where

$$f\_{\rm I3}(\mathbf{y}) = a^{\alpha} b \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \mathbf{e}^{-b \, a \, \mathbf{y}} (1 - a \mathbf{e}^{-b \, \mathbf{y}})^{\beta - 1}, \; \mathbf{y} \ge \ln a^{\frac{1}{b}},\tag{2.1.10}$$

for *a >* 0*,b>* 0*, (α) >* 0*, (β) >* 0, and zero elsewhere. Again, for practical application the special case *α* = 1 is the most useful one. Let the density corresponding to this special case be denoted by *f*14*(y)*. Then,

$$f\_{14}(\mathbf{y}) = ab\beta \mathbf{e}^{-b \cdot \mathbf{y}} (1 - a \mathbf{e}^{-b \cdot \mathbf{y}})^{\beta - 1}, \text{ y} \ge \ln a^{\frac{1}{b}},\tag{i}$$

for *a >* 0*, b>* 0*,β>* 0, and zero elsewhere. In this case,

$$\text{Reliability function } = Pr\{\mathbf{y} \ge t\} = (1 - a\mathbf{e}^{-bt})^\beta,\tag{ii}$$

$$\text{Hazard function } = \frac{f\_{\text{14}}(\mathbf{y} = t)}{Pr\{\mathbf{y} \ge t\}} = \frac{ab\beta\mathbf{e}^{-bt}}{[1 - a\mathbf{e}^{-bt})}.\tag{iii}$$

Now, consider exponentiating a real scalar type-2 beta random variable whose density is given in (2.1.6). For convenience, we will let the parameters be *(α*<sup>1</sup> = *α* and *α*<sup>2</sup> = *β)*. Letting *<sup>u</sup>*<sup>2</sup> <sup>=</sup> <sup>e</sup>−*by* in (2.1.6), we obtain the following density:

$$f\_{21}(\mathbf{y}) = a^{\alpha} b \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \mathbf{e}^{-b\alpha\mathbf{y}} (1 + a \mathbf{e}^{-b\mathbf{y}})^{-(\alpha + \beta)}, \ -\infty < \mathbf{y} < \infty,\tag{2.1.11}$$

for *a >* 0*,b>* 0*, (α) >* 0*, (β) >* 0, and zero elsewhere. The model in (2.1.11) is in fact the *generalized logistic model* introduced by Mathai and Provost (2006). For the special case *α* = 1*, β* = 1*, a* = 1*, b* = 1 in (2.1.11), we have the following density:

$$f\_{22}(\mathbf{y}) = \frac{\mathbf{e}^{-\mathbf{y}}}{(1 + \mathbf{e}^{-\mathbf{y}})^2} = \frac{\mathbf{e}^{\mathbf{y}}}{(1 + \mathbf{e}^{\mathbf{y}})^2}, \quad -\infty < \mathbf{y} < \infty. \tag{\dot{\mathbf{w}}}$$

This is the famous *logistic model* which is utilized in industrial applications.

#### **2.1.6. The Student-***t* **distribution in the real domain**

A real Student-*t* variable with *ν* degrees of freedom, denoted by *tν*, is defined as *tν* = √ *z χ*2 *<sup>ν</sup> /ν* where *<sup>z</sup>* <sup>∼</sup> *<sup>N</sup>*1*(*0*,* <sup>1</sup>*)* and *<sup>χ</sup>*<sup>2</sup> *<sup>ν</sup>* is a real scalar chisquare with *ν* degrees of freedom, *z* and *χ*<sup>2</sup> *<sup>ν</sup>* being independently distributed. It follows from the definition of a real *Fm,n* random variable, that *t*<sup>2</sup> *<sup>ν</sup>* <sup>=</sup> *<sup>z</sup>*<sup>2</sup> *χ*2 *<sup>ν</sup> /ν* <sup>=</sup> *<sup>F</sup>*1*,ν*, an *<sup>F</sup>* random variable with 1 and *<sup>ν</sup>* degrees of freedom. Thus, the density of *t*<sup>2</sup> *<sup>ν</sup>* is available from that of an *F*1*,ν*. On substituting the values *<sup>m</sup>* <sup>=</sup> <sup>1</sup>*, n* <sup>=</sup> *<sup>ν</sup>* in the *<sup>F</sup>*-density appearing in (2.1.4), we obtain the density of *<sup>t</sup>*<sup>2</sup> <sup>=</sup> *<sup>w</sup>*, denoted by *fw(w)*, as

$$f\_w(w) = \frac{\Gamma(\frac{\upsilon+1}{2})}{\sqrt{\pi}\Gamma(\frac{\upsilon}{2})} \left(\frac{1}{\upsilon}\right)^{\frac{1}{2}} \frac{w^{\frac{1}{2}-1}}{(1+\frac{w}{\upsilon})^{\frac{\upsilon+1}{2}}}, \ 0 \le w < \infty,\tag{2.1.12}$$

for *<sup>w</sup>* <sup>=</sup> *<sup>t</sup>*2*, ν* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,...* and *fw(w)* <sup>=</sup> 0 elsewhere. Since *<sup>w</sup>* <sup>=</sup> *<sup>t</sup>*2, then the part of the density for *t >* 0 is available from (2.1.12) by observing that <sup>1</sup> 2*w* 1 <sup>2</sup><sup>−</sup>1d*<sup>w</sup>* <sup>=</sup> <sup>d</sup>*<sup>t</sup>* for *t >* 0. Hence for *t >* 0 that part of the Student-*t* density is available from (2.1.12) as

$$f\_{\rm ll}(t) = 2\frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\pi\nu}\Gamma(\frac{\nu}{2})}(1+\frac{t^2}{\nu})^{-(\frac{\nu+1}{2})},\ 0 \le t < \infty,\tag{2.1.13}$$

and zero elsewhere. Since (2.1.13) is symmetric, we extend it over *(*−∞*,*∞*)* and so, obtain the real Student-t density, denoted by *ft(t)*. This is stated in the next theorem.

**Theorem 2.1.6.** *Consider a real scalar standard normal variable z, which is divided by the square root of a real chisquare variable with ν degrees of freedom divided by its* *number of degrees of freedom ν, that is, t* = √ *z χ*2 *nu/ν , where <sup>z</sup> and <sup>χ</sup>*<sup>2</sup> *<sup>ν</sup> are independently distributed; then t is known as the real scalar Student-t variable and its density is given by*

$$f\_l(t) = \frac{\Gamma(\frac{\upsilon+1}{2})}{\sqrt{\pi \upsilon} \Gamma(\frac{\upsilon}{2})} \left(1 + \frac{t^2}{\upsilon}\right)^{-(\frac{\upsilon+1}{2})}, \ -\infty < t < \infty,\tag{2.1.14}$$

*for ν* = 1*,* 2*,....*

#### **2.1a.4. The Student-***t* **distribution in the complex domain**

Let *<sup>z</sup>*˜ <sup>∼</sup> *<sup>N</sup>*˜1*(*0*,* <sup>1</sup>*)* and *<sup>y</sup>*˜ ∼ ˜*χ*<sup>2</sup> *<sup>ν</sup>* in the complex domain or equivalently *y*˜ is distributed as a real gamma with the parameters *(α* = *ν, β* = 1*)*, and let these random variables be independently distributed. Then, we will define Student-t with *ν* degrees of freedom in the complex domain as follows:

$$\tilde{t} = \tilde{t}\_{\nu} = \frac{|\tilde{z}|}{\sqrt{\tilde{\chi}\_{\nu}^{2}/\nu}}, \ |\tilde{z}| = (z\_1^2 + z\_2^2)^{\frac{1}{2}}, \ \tilde{z} = z\_1 + iz\_2$$

with *<sup>z</sup>*1*, z*<sup>2</sup> real and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. What is then the density of *<sup>t</sup>* ˜ *<sup>ν</sup>*? The joint density of *z*˜ and *y*˜, denoted by *f (*˜ *y,*˜ *z)*˜ , is

$$
\tilde{f}(\tilde{z}, \tilde{\mathbf{y}}) \mathbf{d}\tilde{\mathbf{y}} \wedge \mathbf{d}\tilde{z} = \frac{1}{\pi \varGamma(\nu)} \mathbf{y}^{\nu - 1} \mathbf{e}^{-\operatorname{y} - |\tilde{z}|^2} \mathbf{d}\tilde{\mathbf{y}} \wedge \mathbf{d}\tilde{z}.
$$

Let *<sup>z</sup>*˜ <sup>=</sup> *<sup>z</sup>*<sup>1</sup> <sup>+</sup>*iz*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, where *<sup>z</sup>*<sup>1</sup> <sup>=</sup> *<sup>r</sup>* cos *<sup>θ</sup>* and *<sup>z</sup>*<sup>2</sup> <sup>=</sup> *<sup>r</sup>* sin *θ ,* <sup>0</sup> <sup>≤</sup> *r <* <sup>∞</sup>*,* <sup>0</sup> <sup>≤</sup> *<sup>θ</sup>* <sup>≤</sup> 2*π*. Then, d*z*<sup>1</sup> ∧ d*z*<sup>2</sup> = *r* d*r* ∧ d*θ*, and the joint density of *r* and *y*˜, denoted by *f*1*(r, y)*, is the following after integrating out *θ*, observing that *y* has a real gamma density:

$$f\_1(r, \mathbf{y})\mathbf{d}r \wedge \mathbf{d}\mathbf{y} = \frac{2}{\Gamma(\nu)} \mathbf{y}^{\nu - 1} \mathbf{e}^{-\mathbf{y} - r^2} r \mathbf{d}r \wedge \mathbf{d}\mathbf{y}.$$

Let *<sup>u</sup>* <sup>=</sup> *<sup>t</sup>*<sup>2</sup> <sup>=</sup> *νr*<sup>2</sup> *y* and *<sup>y</sup>* <sup>=</sup> *<sup>w</sup>*. Then, d*u*∧d*<sup>w</sup>* <sup>=</sup> <sup>2</sup>*νr <sup>w</sup>* <sup>d</sup>*<sup>r</sup>* <sup>∧</sup>d*<sup>y</sup>* and so, *<sup>r</sup>*d*<sup>r</sup>* <sup>∧</sup>d*<sup>y</sup>* <sup>=</sup> *<sup>w</sup>* <sup>2</sup>*<sup>ν</sup>* d*u*∧d*w*. Letting the joint density of *u* and *w* be denoted by *f*2*(u, w)*, we have

$$f\_2(\boldsymbol{\mu}, w) = \frac{1}{\nu \boldsymbol{\Gamma}(\nu)} w^{\nu} \mathbf{e}^{-(w + \frac{uw}{\nu})}$$

and the marginal density of *u*, denoted by *g(u)*, is as follows:

$$\log(u) = \int\_{w=0}^{\infty} f\_2(u,v) \mathrm{d}w = \int\_0^{\infty} \frac{w^v}{\Gamma(\upsilon+1)} \mathrm{e}^{-w\left(1+\frac{u}{\upsilon}\right)} \mathrm{d}w = \left(1+\frac{u}{\upsilon}\right)^{-(\upsilon+1)}$$

for 0 <sup>≤</sup> *u <* <sup>∞</sup>*, ν* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,..., u* <sup>=</sup> *<sup>t</sup>*<sup>2</sup> and zero elsewhere. Thus the part of the density of *t*, for *t >* 0 denoted by *f*1*t(t)* is as follows, observing that d*u* = 2*t*d*t* for *t >* 0:

$$f\_{\rm Il}(t) = 2t \left( 1 + \frac{t^2}{\nu} \right)^{-(\nu+1)}, \ 0 \le t < \infty, \ \nu = 1, 2, \dots \tag{2.1a.6}$$

Extending this density over the real line, we obtain the following density of *t* ˜ in the complex case:

$$\tilde{f}\_{\nu}(t) = |t| \left( 1 + \frac{t^2}{\nu} \right)^{-(\nu+1)}, \ -\infty < t < \infty, \ \nu = 1, \dots \tag{2.1a.7}$$

Thus, the following result:

**Theorem 2.1a.6.** *Let <sup>z</sup>*˜ <sup>∼</sup> *<sup>N</sup>*˜1*(*0*,* <sup>1</sup>*), <sup>y</sup>*˜ ∼ ˜*χ*<sup>2</sup> *<sup>ν</sup> , a scalar chisquare in the complex domain and let z*˜ *and y*˜ *in the complex domain be independently distributed. Consider the real variable t* = *tν* = √ |˜*z*| *y/ν* ˜ *. Then this <sup>t</sup> will be called a Student-t with <sup>ν</sup> degrees of freedom in the complex domain and its density is given by (2.1a.7).*

#### **2.1.7. The Cauchy distribution in the real domain**

We have already seen a ratio distribution in Sect. 2.1.3, namely the real type-2 beta distribution and, as particular cases, the real F-distribution and the real *t*<sup>2</sup> distribution. We now consider a ratio of two independently distributed real standard normal variables. Let *z*<sup>1</sup> ∼ *N*1*(*0*,* 1*)* and *z*<sup>2</sup> ∼ *N*1*(*0*,* 1*)* be independently distributed. The joint density of *z*<sup>1</sup> and *z*2, denoted by *f (z*1*, z*2*)*, is given by

$$f(z\_1, z\_2) = \frac{1}{2\pi} \mathbf{e}^{-\frac{1}{2}(z\_1^2 + z\_2^2)}, \ -\infty < z\_j < \infty, j = 1, 2.$$

Consider the quadrant *<sup>z</sup>*<sup>1</sup> *<sup>&</sup>gt;* <sup>0</sup>*, z*<sup>2</sup> *<sup>&</sup>gt;* 0 and the transformation *<sup>u</sup>* <sup>=</sup> *<sup>z</sup>*<sup>1</sup> *z*2 *, v* = *z*2. Then d*z*<sup>1</sup> ∧d*z*<sup>2</sup> = *v*d*u*∧d*v*, see Sect. 2.1.3. Note that *u >* 0 covers the quadrants *z*<sup>1</sup> *>* 0*, z*<sup>2</sup> *>* 0 and *z*<sup>1</sup> *<* 0*, z*<sup>2</sup> *<* 0. The part of the density of *u* in the quadrant *u >* 0*,v >* 0, denoted as *g(u, v)*, is given by

$$\mathbf{g}(\boldsymbol{\mu}, \boldsymbol{v}) = \frac{\boldsymbol{v}}{2\pi} \mathbf{e}^{-\frac{1}{2}\boldsymbol{v}^{2}(1+\boldsymbol{\mu}^{2})}$$

and that part of the marginal density of *u*, denoted by *g*1*(u)*, is

$$\mathrm{g}\_{\mathrm{I}}(\mu) = \frac{1}{2\pi} \int\_0^\infty \nu \mathbf{e}^{-v^2 \frac{(1+\mu^2)}{2}} \mathrm{d}v = \frac{1}{2\pi (1+\mu^2)}.$$

The other two quadrants *z*<sup>1</sup> *>* 0*, z*<sup>2</sup> *<* 0 and *z*<sup>1</sup> *<* 0*, z*<sup>2</sup> *>* 0*,* which correspond to *u <* 0, will yield the same form as above. Accordingly, the density of the ratio *<sup>u</sup>* <sup>=</sup> *<sup>z</sup>*<sup>1</sup> *z*2 , known as the real Cauchy density, is as specified in the next theorem.

**Theorem 2.1.7.** *Consider the independently distributed real standard normal variables <sup>z</sup>*<sup>1</sup> <sup>∼</sup> *<sup>N</sup>*1*(*0*,* <sup>1</sup>*) and <sup>z</sup>*<sup>2</sup> <sup>∼</sup> *<sup>N</sup>*1*(*0*,* <sup>1</sup>*). Then the ratio <sup>u</sup>* <sup>=</sup> *<sup>z</sup>*<sup>1</sup> *<sup>z</sup>*<sup>2</sup> *has the real Cauchy distribution having the following density:*

$$g\_{\mu}(u) = \frac{1}{\pi(1+u^2)}, \ -\infty < u < \infty. \tag{2.1.15}$$

By integrating out in each interval *(*−∞*,* 0*)* and *(*0*,*∞*)*, with the help of a type-2 beta integral, it can be established that (2.1.15) is indeed a density. Since *gu(u)* is symmetric, *P r*{*<sup>u</sup>* <sup>≤</sup> <sup>0</sup>} = *P r*{*<sup>u</sup>* <sup>≥</sup> <sup>0</sup>} = <sup>1</sup> <sup>2</sup> , and one could posit that the mean value of *u* may be zero. However, observe that

$$\int\_0^\infty \frac{u}{1+u^2} \mathrm{d}u = \frac{1}{2} \ln(1+u^2) \Big|\_0^\infty \to \infty.$$

Thus, *E(u)*, the mean value of a real Cauchy random variable, does not exist, which implies that the higher moments do not exist either.

#### **Exercises 2.1**

**2.1.1.** Consider someone throwing dart at a board to hit a point on the board. Taking this target point as the origin, consider a rectangular coordinate system. If *(x, y)* is a point of hit, then compute the densities of *x* and *y* under the following assumptions: (1): There is no bias in the horizontal and vertical directions or *x* and *y* are independently distributed; (2): The joint density is a function of the distance from the origin  *x*<sup>2</sup> + *y*2. That is, if *f*1*(x)* and *f*2*(y)* are the densities of *x* and *y* then it is given that *f*1*(x)f*2*(y)* = *g( x*<sup>2</sup> + *y*2*)* where *f*1*, f*2*, g* are unknown functions. Show that *f*<sup>1</sup> and *f*<sup>2</sup> are identical and real normal densities.

**2.1.2.** Generalize Exercise 2.1.1 to 3-dimensional Euclidean space or

$$g(\sqrt{\mathbf{x}^2 + \mathbf{y}^2 + z^2}) = f\_\mathbf{l}(\mathbf{x}) f\_\mathbf{2}(\mathbf{y}) f\_\mathbf{3}(z).$$

**2.1.3.** Generalize Exercise 2.1.2 to *k*-space, *k* ≥ 3.

**2.1.4.** Let *f (x)* be an arbitrary density. Then Shannon's measure of entropy or uncertainty is *S* = −*k* # *<sup>x</sup> f (x)*ln *f (x)*d*x* where *k* is a constant. Optimize *S*, subject to the conditions (a): # <sup>∞</sup> −∞ *f (x)*d*<sup>x</sup>* <sup>=</sup> 1; (b): Condition in (a) plus # <sup>∞</sup> −∞ *xf (x)*d*<sup>x</sup>* <sup>=</sup> given quantity; (c): The conditions in (b) plus # <sup>∞</sup> −∞ *<sup>x</sup>*2*f (x)*d*<sup>x</sup>* <sup>=</sup> a given quantity. Show that under (a), *<sup>f</sup>* is a uniform density; under (b), *f* is an exponential density and under (c), *f* is a Gaussian density. Hint: Use Calculus of Variation.

**2.1.5.** Let the error of measurement satisfy the following conditions: (1) = <sup>1</sup> + <sup>2</sup> + ··· or it is a sum of infinitely many infinitesimal contributions *j* 's where the *j* 's are independently distributed. (2): Suppose that *j* can only take two values *δ* with probability 1 <sup>2</sup> and <sup>−</sup>*<sup>δ</sup>* with probability <sup>1</sup> <sup>2</sup> for all *<sup>j</sup>* . (3): Var*()* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>lt;* <sup>∞</sup>. Then show that this error density is real Gaussian. Hint: Use mgf. [This is Gauss' derivation of the normal law and hence it is called the error curve or Gaussian density also.]

**2.1.6.** The pathway model of Mathai (2005) has the following form in the case of real positive scalar variable *x*:

$$f\_1(\mathbf{x}) = c\_1 \mathbf{x}^\gamma [1 - a(1 - q)\mathbf{x}^\delta]^{\frac{1}{1 - q}}, q < 1, 0 \le \mathbf{x} \le [a(1 - q)]^{-\frac{1}{\delta}},$$

for *δ >* 0*,a >* 0*,γ >* −1 and *f*1*(x)* = 0 elsewhere. Show that this generalized type-1 beta form changes to generalized type-2 beta form for *q >* 1,

$$f\_2(\mathbf{x}) = c\_2 \mathbf{x}^\mathsf{V} [1 + a(q - 1)\mathbf{x}^\delta]^{-\frac{1}{q - 1}}, q > 1, \mathbf{x} \ge 0, \delta > 0, a > 0$$

and *f*2*(x)* = 0 elsewhere, and for *q* → 1, the model goes into a generalized gamma form given by

$$f\_3(\mathbf{x}) = c\_3 \mathbf{x}^\mathcal{V} \mathbf{e}^{-a\mathbf{x}^\delta}, a > 0, \delta > 0, \mathbf{x} \ge 0$$

and zero elsewhere. Evaluate the normalizing constants *c*1*, c*2*, c*3. All models are available either from *f*1*(x)* or from *f*2*(x)* where *q* is the pathway parameter.

**2.1.7.** Make a transformation *<sup>x</sup>* <sup>=</sup> <sup>e</sup>−*<sup>t</sup>* in the generalized gamma model of *<sup>f</sup>*3*(x)* of Exercise 2.1.6. Show that an extreme-value density for *t* is available.

**2.1.8.** Consider the type-2 beta model

$$f(\mathbf{x}) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \mathbf{x}^{\alpha - 1} (1 + \mathbf{x})^{-(\alpha + \beta)}, \mathbf{x} \ge 0, \Re(\alpha) > 0, \Re(\beta) > 0$$

and zero elsewhere. Make the transformation *<sup>x</sup>* <sup>=</sup> <sup>e</sup>*<sup>y</sup>* and then show that *<sup>y</sup>* has a generalized logistic distribution and as a particular case there one gets the logistic density.

**2.1.9.** Show that for 0 <sup>≤</sup> *x <* <sup>∞</sup>*,β >* 0, *f (x)* <sup>=</sup> *<sup>c</sup>*[<sup>1</sup> <sup>+</sup> <sup>e</sup>*α*+*βx*] <sup>−</sup><sup>1</sup> is a density, which is known as Fermi-Dirac density. Evaluate the normalizing constant *c*.

**2.1.10.** Let *f (x)* <sup>=</sup> *<sup>c</sup>*[e*α*+*βx* <sup>−</sup> <sup>1</sup>] <sup>−</sup><sup>1</sup> for 0 <sup>≤</sup> *x <* <sup>∞</sup>*,β >* 0. Show that *f (x)* is a density, known as Bose-Einstein density. Evaluate the normalizing constant *c*.

**2.1.11.** Evaluate the incomplete gamma integral *γ (α*; *b)* <sup>=</sup> # *<sup>b</sup>* <sup>0</sup> *<sup>x</sup>α*−1e−*x*d*<sup>x</sup>* and show that it can be written in terms of the confluent hypergeometric series

$${}\_{1}F\_{1}(\beta; \delta; \mathbf{y}) = \sum\_{k=0}^{\infty} \frac{(\beta)\_{k}}{(\delta)\_{k}} \frac{\mathbf{y}^{k}}{k!},$$

*(α)k* = *α(α* + 1*)*···*(α* + *k* − 1*), α* = 0*, (α)*<sup>0</sup> = 1 is the Pochhammer symbol. Evaluate the normalizing constant *<sup>c</sup>* if *f (x)* <sup>=</sup> *cxα*−1e−*x,* <sup>0</sup> <sup>≤</sup> *<sup>x</sup>* <sup>≤</sup> *a, α >* 0 and zero elsewhere, is a density.

**2.1.12.** Evaluate the incomplete beta integral *b(α*; *<sup>β</sup>*; *b)* <sup>=</sup> # *<sup>b</sup>* <sup>0</sup> *<sup>x</sup>α*−1*(*<sup>1</sup> <sup>−</sup> *x)β*−1*,α>* 0*,β>* 0*,* 0 ≤ *b* ≤ 1. Show that it is available in terms of a Gauss' hypergeometric series of the form <sup>2</sup>*F*1*(a, b*; *c*; *z)* = <sup>∞</sup> *k*=0 *(a)k(b)k (c)k zk k*! *,* |*z*| *<* 1*.*

**2.1.13.** For the pathway model in Exercise 2.1.6 compute the reliability function *P r*{*x* ≥ *t*} when *γ* = 0 for all the cases *q <* 1*,q>* 1*, q* → 1.

**2.1.14.** *Weibull density*: In the generalized gamma density *f (x)* <sup>=</sup> *cx<sup>γ</sup>* <sup>−</sup>1e−*ax<sup>δ</sup> , x* ≥ 0*,γ >* 0*,a >* 0*,δ >* 0 and zero elsewhere, if *δ* = *γ* then *f (x)* is called a Weibull density. For a Weibull density, evaluate the hazard function *h(t)* = *f (t)/P r*{*x* ≥ *t*}.

**2.1.15.** Consider a type-1 beta density *f (x)* <sup>=</sup> *Γ (α*+*β) Γ (α)Γ (β)xα*−1*(*1−*x)β*−1*,* <sup>0</sup> <sup>≤</sup> *<sup>x</sup>* <sup>≤</sup> <sup>1</sup>*,α >* <sup>0</sup>*,β >* 0 and zero elsewhere. Let *<sup>α</sup>* <sup>=</sup> 1. Consider a power transformation *<sup>x</sup>* <sup>=</sup> *<sup>y</sup>δ*, *δ >* 0. Let this model be *g(y)*. Compute the reliability function *P r*{*y* ≥ *t*} and the hazard function *h(t)* = *g(t)/P r*{*y* ≥ *t*}.

**2.1.16.** Verify that if *z* is a real standard normal variable, *E(*e*t z*<sup>2</sup> *)* <sup>=</sup> *(*1−2*t)*−1*/*2, *t <* <sup>1</sup>*/*2*,* which is the mgf of a chi-square random variable having one degree of freedom. Owing to the uniqueness of the mgf, this result establishes that *<sup>z</sup>*<sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> 1 .

#### **2.2. Quadratic Forms, Chisquaredness and Independence in the Real Domain**

Let *x*1*,...,xp* be iid (independently and identically distributed) real scalar random variables distributed as *N*1*(*0*,* 1*)* and *X* be a *p*×1 vector whose components are *x*1*,...,xp*, that is, *X*- = *(x*1*,...,xp)*. Consider the real quadratic form *u*<sup>1</sup> = *X*- *AX* for some *p* × *p* real constant symmetric matrix *A* = *A*- . Then, we have the following result:

**Theorem 2.2.1.** *The quadratic form u*<sup>1</sup> = *X*- *AX, A* = *A*- *, where the components of X are iid N*1*(*0*,* 1*), is distributed as a real chisquare with r, r* ≤ *p, degrees of freedom if and only if <sup>A</sup> is idempotent, that is, <sup>A</sup>* <sup>=</sup> *<sup>A</sup>*2*, and <sup>A</sup> of rank <sup>r</sup>.*

*Proof***:** When *A* = *A* is real, there exists an orthonormal matrix *P*, *P P*- = *I,P*- *P* = *I* , such that *P*- *AP* = diag*(λ*1*,...,λp)*, where the *λj* 's are the eigenvalues of *A*. Consider the transformation *X* = *P Y* or *Y* = *P*- *X*. Then

$$AX'AX = Y'P'APY = \lambda\_1 \mathbf{y}\_1^2 + \lambda\_2 \mathbf{y}\_2^2 + \dots + \lambda\_p \mathbf{y}\_p^2 \tag{i}$$

where *y*1*,...,yp* are the components of *Y* and *λ*1*,...,λp* are the eigenvalues of *A*. We have already shown in Theorem 2.1.1 that all linear functions of independent real normal variables are also real normal and hence, all the *yj* 's are normally distributed. The expectation of Y is *E*[*Y* ] = *E*[*P*- *X*] = *P*- *E(X)* = *P*- *O* = *O* and the covariance matrix associated with *Y* is

$$\text{Cov}(Y) = E[Y - E(Y)][Y - E(Y)]' = E[YY] = P'\\ \text{Cov}(X)P = P'IP = P'P = I$$

which means that the *yj* 's are real standard normal variables that are mutually independently distributed. Hence, *y*<sup>2</sup> *<sup>j</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> or each *<sup>y</sup>*<sup>2</sup> *<sup>j</sup>* is a real chisquare with one degree of freedom each and the *yj* 's are all mutually independently distributed. If *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> and the rank of *A* is *r*, then *r* of the eigenvalues of *A* are unities and the remaining ones are equal to zero as the eigenvalues of an idempotent matrix can only be equal to zero or one, the number of ones being equal to the rank of the idempotent matrix. Then the representation in *(i)* becomes sum of *r* independently distributed real chisquares of one degree of freedom each and hence the sum is a real chisquare of *r* degrees of freedom. Hence, the sufficiency of the result is proved. For the necessity, we assume that *X*- *AX* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *r* and we must prove that *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> and *<sup>A</sup>* is of rank *<sup>r</sup>*. Note that it is assumed throughout that *A* = *A*- . If *X*- *AX* is a real chisquare having *r* degrees of freedom, then the mgf of *u*<sup>1</sup> = *X*- *AX* is given by *Mu*<sup>1</sup> *(t)* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t)*−*<sup>r</sup>* <sup>2</sup> . From the representation given in *(i)*, the mgf's are as follows: *My*<sup>2</sup> *j (t)* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t)*−<sup>1</sup> <sup>2</sup> ⇒ *Mλj <sup>y</sup>*<sup>2</sup> *j (t)* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*λj t)*−<sup>1</sup> <sup>2</sup> *, j* = 1*,...,p*, the *yj* 's being independently distributed. Thus, the mgf of the right-hand side of *(i)* is *Mu*<sup>1</sup> *(t)* = *p <sup>j</sup>*=1*(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*λj t)*−<sup>1</sup> <sup>2</sup> . Hence, we have

$$\left( (1 - 2t)^{-\frac{r}{2}} = \prod\_{j=1}^{p} (1 - 2\lambda\_j t)^{-\frac{1}{2}}, \ 1 - 2t > 0, \ 1 - 2\lambda\_j t > 0, \ j = 1, \ldots, p. \tag{ii}$$

Taking the natural logarithm of each side of *(ii)*, expanding the terms and then comparing the coefficients of *(*2*t)<sup>n</sup> <sup>n</sup>* on both sides for *n* = 1*,* 2*,...*, we obtain equations of the type

$$r = \sum\_{j=1}^{p} \lambda\_j = \sum\_{j=1}^{p} \lambda\_j^2 = \sum\_{j=1}^{p} \lambda\_j^3 = \dotsb \tag{iiii}$$

The only solution resulting from *(iii)* is that *r* of the *λj* 's are unities and the remaining ones are zeros. This result, combined with the property that *A* = *A* guarantees that *A* is idempotent of rank *r*.

Observe that the eigenvalues of a matrix being ones and zeros need not imply that the matrix is idempotent; take for instance triangular matrices whose diagonal elements are unities and zeros. However, this property combined with the symmetry assumption will guarantee that the matrix is idempotent.

**Corollary 2.2.1.** *If the simple random sample or the iid variables came from a real N*1*(*0*, σ*2*) distribution, then the modification needed in Theorem 2.2.1 is that* <sup>1</sup> *σ*<sup>2</sup>*X*- *AX* ∼ *χ*2 *<sup>r</sup> , A* = *A*- *, if and only if <sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> *and <sup>A</sup> is of rank <sup>r</sup>.*

The above result, Theorem 2.2.1, coupled with another result on the independence of quadratic forms, are quite useful in the areas of Design of Experiment, Analysis of Variance and Regression Analysis, as well as in model building and hypotheses testing situations. This result on the independence of quadratic forms is stated next.

**Theorem 2.2.2.** *Let x*1*,...,xp be iid variables from a real N*1*(*0*,* 1*) population. Consider two real quadratic forms u*<sup>1</sup> = *X*- *AX, A* = *A and u*<sup>2</sup> = *X*- *BX, B* = *B*- *, where the components of the p* × 1 *vector X are the x*1*,...,xp. Then, u*<sup>1</sup> *and u*<sup>2</sup> *are independently distributed if and only if AB* = *O.*

*Proof***:** Let us assume that *AB* = *O*. Then *AB* = *O* = *O*- = *(AB)*- = *B*- *A*- = *BA*. When *AB* = *BA*, there exists a single orthonormal matrix *P,PP*- = *I,P*- *P* = *I* , such that both the quadratic forms are reduced to their canonical forms by the same *P*. Let

$$
\mu\_1 = X^\prime AX = \lambda\_1 \mathbf{y}\_1^2 + \dots + \lambda\_p \mathbf{y}\_p^2 \tag{i}
$$

and

$$\mu\_2 = X^\prime B X = \nu\_1 \mathbf{y}\_1^2 + \dots + \nu\_p \mathbf{y}\_p^2 \tag{ii}$$

where *λ*1*,...,λp* are the eigenvalues of *A* and *ν*1*,...,νp* are the eigenvalues of *B*. Since *A* = *A*- , the eigenvalues *λj* 's are all real. Moreover,

$$\begin{aligned} AB &= O \Rightarrow\\ P'ABP &= P'APP'BP = D\_1 D\_2 = \begin{bmatrix} \lambda\_1 & 0 & \dots & 0\\ 0 & \lambda\_2 & \dots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \lambda\_p \end{bmatrix} \begin{bmatrix} \nu\_1 & 0 & \dots & 0\\ 0 & \nu\_2 & \dots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \nu\_p \end{bmatrix} = O,\end{aligned} \tag{20}$$

which means that *λj νj* = 0 for all *j* = 1*,...,p*. Thus, whenever a *λj* is not zero, the corresponding *νj* is zero and vice versa. Accordingly, the *λj* 's and *νj* 's are separated in *(i)* and *(ii)*, that is, the independent components are mathematically separated and hence *u*<sup>1</sup> and *u*<sup>2</sup> are statistically independently distributed. The converse which can be stated as follows: if *u*<sup>1</sup> and *u*<sup>2</sup> are independently distributed, *A* = *A*- *, B* = *B* and the *xj* 's are real iid *N*1*(*0*,* 1*)*, then *AB* = *O*, is more difficult to establish. The proof which requires additional properties of matrices, will not be herein presented. Note that there are several incorrect or incomplete "proofs" in the literature. A correct derivation may be found in Mathai and Provost (1992).

When *x*1*,...,xp* are iid *N*1*(*0*, σ*2*)*, the above result on the independence of quadratic forms still holds since the independence is not altered by multiplying the quadratic forms by <sup>1</sup> *σ*2 .

**Example 2.2.1.** Construct two 3 × 3 matrices *A* and *B* such that *A* = *A*- *, B* = *B*- [both are symmetric], *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> [*<sup>A</sup>* is idempotent], *AB* <sup>=</sup> *<sup>O</sup>* [*<sup>A</sup>* and *<sup>B</sup>* are orthogonal to each other], and *A* has rank 2. Then (1): verify Theorem 2.2.1; (2): verify Theorem 2.2.2.

**Solution 2.2.1.** Consider the following matrices:

$$A = \begin{bmatrix} \frac{1}{2} & 0 & -\frac{1}{2} \\ 0 & 1 & 0 \\ -\frac{1}{2} & 0 & \frac{1}{2} \end{bmatrix}, \ B = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 0 & 0 \\ 1 & 0 & 1 \end{bmatrix}.$$

Note that both *A* and *B* are symmetric, that is, *A* = *A*- *, B* = *B*- . Further, the rank of *A* is 2 since the first and second row vectors are linearly independent and the third row is a multiple of the first one. Note that *<sup>A</sup>*<sup>2</sup> <sup>=</sup> *<sup>A</sup>* and *AB* <sup>=</sup> *<sup>O</sup>*. Now, consider the quadratic forms *u* = *X*- *AX* and *v* = *X*- *BX*. Then *<sup>u</sup>* <sup>=</sup> <sup>1</sup> 2 *x*2 <sup>1</sup> <sup>+</sup>*x*<sup>2</sup> <sup>2</sup> <sup>+</sup> <sup>1</sup> 2 *x*2 <sup>3</sup> <sup>−</sup>*x*1*x*<sup>3</sup> <sup>=</sup> *<sup>x</sup>*<sup>2</sup> 2 +[√ 1 2 *(x*1−*x*3*)*] 2. Our initial assumption is that *xj* ∼ *N*1*(*0*,* 1*), j* = 1*,* 2*,* 3 and the *xj* 's are independently distributed. Let *y*<sup>1</sup> = <sup>√</sup> 1 2 *(x*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*3*)*. Then, *<sup>E</sup>*[*y*1] = <sup>0</sup>*,* Var*(y*1*)* = +<sup>1</sup> <sup>2</sup> [Var*(x*1*)* + Var*(x*3*)*] =

1 <sup>2</sup> [1 + 1] = 1. Since *y*<sup>1</sup> is a linear function of normal variables, *y*<sup>1</sup> is normal with the parameters *<sup>E</sup>*[*y*1] = 0 and Var*(y*1*)* <sup>=</sup> 1, that is, *<sup>y</sup>*<sup>1</sup> <sup>∼</sup> *<sup>N</sup>*1*(*0*,* <sup>1</sup>*)*, and hence *<sup>y</sup>*<sup>2</sup> <sup>1</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> 1 ; as well, *x*<sup>2</sup> <sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> . Thus, *<sup>u</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>2</sup> since *x*<sup>2</sup> and *y*<sup>1</sup> are independently distributed given that the variables are separated, noting that *y*<sup>1</sup> does not involve *x*2. This verifies Theorem 2.2.1. Now, having already determined that *AB* = *O*, it remains to show that *u* and *v* are independently distributed where *v* = *X*- *BX* <sup>=</sup> *<sup>x</sup>*<sup>2</sup> <sup>1</sup> <sup>+</sup> *<sup>x</sup>*<sup>2</sup> <sup>3</sup> <sup>+</sup> <sup>2</sup>*x*1*x*<sup>3</sup> <sup>=</sup> *(x*<sup>1</sup> <sup>+</sup> *<sup>x</sup>*3*)*2. Let *y*<sup>2</sup> = <sup>√</sup> 1 2 *(x*<sup>1</sup> + *x*3*)* ⇒ *y*<sup>2</sup> ∼ *N*1*(*0*,* 1*)* as *y*<sup>2</sup> is a linear function of normal variables and hence normal with parameters *E*[*y*2] = 0 and Var*(y*2*)* = 1. On noting that *v* does not contain *x*2, we need only consider the parts of *u* and *v* containing *x*<sup>1</sup> and *x*3. Thus, our question reduces to: are *y*<sup>1</sup> and *y*<sup>2</sup> independently distributed? Since both *y*<sup>1</sup> and *y*<sup>2</sup> are linear functions of normal variables, both *y*<sup>1</sup> and *y*<sup>2</sup> are normal. Since the covariance between *y*<sup>1</sup> and *<sup>y</sup>*2, that is, Cov*(y*1*, y*2*)* <sup>=</sup> <sup>1</sup> <sup>2</sup>Cov*(x*1−*x*3*, x*1+*x*3*)* <sup>=</sup> <sup>1</sup> <sup>2</sup> [Var*(x*1*)*−Var*(x*3*)*] = <sup>1</sup> <sup>2</sup> [1−1] = 0*,* the two normal variables are uncorrelated and hence, independently distributed. That is, *y*<sup>1</sup> and *y*<sup>2</sup> are independently distributed, thereby implying that *u* and *v* are also independently distributed, which verifies Theorem 2.2.2.

#### **2.2a. Hermitian Forms, Chisquaredness and Independence in the Complex Domain**

Let *x*˜1*, x*˜2*,..., x*˜*<sup>k</sup>* be independently and identically distributed standard univariate Gaussian variables in the complex domain and let *X*˜ be a *k* × 1 vector whose components are *x*˜1*,..., x*˜*k*. Consider the Hermitian form *X*˜ <sup>∗</sup>*AX, A* ˜ = *A*<sup>∗</sup> (Hermitian) where *A* is a *k* × *k* constant Hermitian matrix. Then, there exists a unitary matrix *Q*, *QQ*<sup>∗</sup> = *I,Q*∗*Q* = *I* , such that *Q*∗*AQ* = diag*(λ*1*,...,λk)*. Note that the *λj* 's are real since *A* is Hermitian. Consider the transformation *X*˜ = *QY*˜. Then,

$$
\tilde{X}^\* A \tilde{X} = \lambda\_1 |\tilde{\mathbf{y}}\_1|^2 + \dots + \lambda\_k |\tilde{\mathbf{y}}\_k|^2,
$$

where the *y*˜*<sup>j</sup>* 's are iid standard normal in the complex domain, *y*˜*<sup>j</sup>* ∼ *N*˜1*(*0*,* 1*), j* = 1*,...,k*. Then, *y*˜<sup>∗</sup> *<sup>j</sup> y*˜*<sup>j</sup>* = |˜*yj* | <sup>2</sup> ∼ ˜*χ*<sup>2</sup> <sup>1</sup> , a chisquare having one degree of freedom in the complex domain or, equivalently, a real gamma random variable with the parameters *(α* = 1*, β* = 1*)*, the | ˜*yj* | 2's being independently distributed for *<sup>j</sup>* <sup>=</sup> <sup>1</sup>*,...,k*. Thus, we can state the following result whose proof parallels that in the real case.

**Theorem 2.2a.1.** *Let x*˜1*,..., x*˜*<sup>k</sup> be iid N*˜1*(*0*,* 1*) variables in the complex domain. Consider the Hermitian form u* = *X*˜ <sup>∗</sup>*AX, A* ˜ = *A*<sup>∗</sup> *where X*˜ *is a k* × 1 *vector whose components are x*˜1*,..., x*˜*k. Then, u is distributed as a chisquare in the complex domain with r degrees of freedom or a real gamma with the parameters (α* = *r, β* = 1*), if and only if A is of rank <sup>r</sup> and <sup>A</sup>* <sup>=</sup> *<sup>A</sup>*2*.*

**Theorem 2.2a.2.** *Let the x*˜*<sup>j</sup> 's and X*˜ *be as in Theorem 2.2a.1. Consider two Hermitian forms u*<sup>1</sup> = *X*˜ <sup>∗</sup>*AX, A* ˜ = *A*<sup>∗</sup> *and u*<sup>2</sup> = *X*˜ <sup>∗</sup>*BX, B* ˜ = *B*∗*. Then, u*<sup>1</sup> *and u*<sup>2</sup> *are independently distributed if and only if AB* = *O (null matrix).*

**Example 2.2a.1.** Construct two 3×3 Hermitian matrices *A* and *B*, that is *A* = *A*∗*, B* = *<sup>B</sup>*∗, such that *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> [idempotent] and is of rank 2 with *AB* <sup>=</sup> *<sup>O</sup>*. Then (1): verify Theorems 2.2a.1, (2): verify Theorem 2.2a.2.

**Solution 2.2a.1.** Consider the following matrices

$$A = \begin{bmatrix} \frac{1}{2} & 0 & -\frac{(1+i)}{\sqrt{8}} \\ 0 & 1 & 0 \\ -\frac{(1-i)}{\sqrt{8}} & 0 & \frac{1}{2} \end{bmatrix}, \quad B = \begin{bmatrix} \frac{1}{2} & 0 & \frac{(1+i)}{\sqrt{8}} \\ 0 & 0 & 0 \\ \frac{(1-i)}{\sqrt{8}} & 0 & \frac{1}{2} \end{bmatrix}.$$

It can be readily verified that *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*∗*, B* <sup>=</sup> *<sup>B</sup>*∗*, A* <sup>=</sup> *<sup>A</sup>*2*, AB* <sup>=</sup> *<sup>O</sup>*. Further, on multiplying the first row of *<sup>A</sup>* by <sup>−</sup>2*(*1−*i)* <sup>√</sup><sup>8</sup> , we obtain the third row, and since the third row is a multiple of the first one and the first and second rows are linearly independent, the rank of *A* is 2. Our initial assumption is that *x*˜*<sup>j</sup>* ∼ *N*˜1*(*0*,* 1*), j* = 1*,* 2*,* 3, that is, they are univariate complex Gaussian, and they are independently distributed. Then, *x*˜<sup>∗</sup> *<sup>j</sup> <sup>x</sup>*˜*<sup>j</sup>* ∼ ˜*χ*<sup>2</sup> <sup>1</sup> , a chisquare with one degree of freedom in the complex domain or a real gamma random variable with the parameters *(α* = 1*, β* = 1*)* for each *j* = 1*,* 2*,* 3. Let us consider the Hermitian forms *u* = *X*˜ <sup>∗</sup>*AX*˜ and *v* = *X*˜ <sup>∗</sup>*BX,* ˜ *X*˜ -= *(x*˜1*, x*˜2*, x*˜3*)*. Then

$$\begin{split} u &= \frac{1}{2} \tilde{\boldsymbol{x}}\_{1}^{\*} \tilde{\boldsymbol{x}}\_{1} - \frac{(1+i)}{\sqrt{8}} \tilde{\boldsymbol{x}}\_{1}^{\*} \tilde{\boldsymbol{x}}\_{3} - \frac{(1-i)}{\sqrt{8}} \tilde{\boldsymbol{x}}\_{3}^{\*} \tilde{\boldsymbol{x}}\_{1} + \frac{1}{2} \tilde{\boldsymbol{x}}\_{3}^{\*} \tilde{\boldsymbol{x}}\_{3} + \tilde{\boldsymbol{x}}\_{2}^{\*} \tilde{\boldsymbol{x}}\_{2} \\ &= \tilde{\boldsymbol{x}}\_{2}^{\*} \tilde{\boldsymbol{x}}\_{2} + \frac{1}{2} [\tilde{\boldsymbol{x}}\_{1}^{\*} \tilde{\boldsymbol{x}}\_{1} - 4 \frac{(1+i)}{\sqrt{8}} \tilde{\boldsymbol{x}}\_{1}^{\*} \tilde{\boldsymbol{x}}\_{3} + \tilde{\boldsymbol{x}}\_{3}^{\*} \tilde{\boldsymbol{x}}\_{3}] \\ &= \tilde{\boldsymbol{x}}\_{1}^{2} + \left[ \frac{1}{\sqrt{2}} (\tilde{\boldsymbol{y}}\_{1} - \tilde{\boldsymbol{x}}\_{3}) \right]^{\*} [\frac{1}{\sqrt{2}} (\tilde{\boldsymbol{y}}\_{1} - \tilde{\boldsymbol{x}}\_{3})] \end{split} \tag{i}$$

where

$$\tilde{\mathbf{y}}\_{1} = 2\frac{(1+i)}{\sqrt{8}}\tilde{\mathbf{x}}\_{1} \Rightarrow E[\tilde{\mathbf{y}}\_{1}] = 0,\ \text{Var}(\tilde{\mathbf{y}}\_{1}) = \left| 2\frac{(1+i)}{\sqrt{8}} \right|^{2} \text{Var}(\tilde{\mathbf{x}}\_{1})$$

$$\text{Var}(\tilde{\mathbf{y}}\_{1}) = E\left\{ \left[ 2\frac{(1+i)}{\sqrt{8}} \right]^{\*} \left[ 2\frac{(1+i)}{\sqrt{8}} \right] \tilde{\mathbf{x}}\_{1}^{\*} \tilde{\mathbf{x}}\_{1} \right\} = E\{\tilde{\mathbf{x}}\_{1}^{\*} \tilde{\mathbf{x}}\_{1} \} = \text{Var}(\tilde{\mathbf{x}}\_{1}) = 1. \tag{ii}$$

Since *y*˜<sup>1</sup> is a linear function of *x*˜1, it is a univariate normal in the complex domain with parameters 0 and 1 or *<sup>y</sup>*˜<sup>1</sup> <sup>∼</sup> *<sup>N</sup>*˜1*(*0*,* <sup>1</sup>*)*. The part not containing the *<sup>χ</sup>*˜ <sup>2</sup> <sup>1</sup> in *(i)* can be written as follows:

$$
\left[\frac{1}{\sqrt{2}}(\tilde{\mathbf{y}}\_1 - \tilde{\mathbf{x}}\_3)\right]^\* \left[\frac{1}{\sqrt{2}}(\tilde{\mathbf{y}}\_1 - \tilde{\mathbf{x}}\_3)\right] \sim \tilde{\chi}\_1^2 \tag{iii}
$$

since *y*˜<sup>1</sup> − ˜*x*<sup>3</sup> ∼ *N*˜1*(*0*,* 2*)* as *y*˜<sup>1</sup> − ˜*x*<sup>3</sup> is a linear function of the normal variables *y*˜<sup>1</sup> and *x*˜3. Therefore *<sup>u</sup>* = ˜*χ*<sup>2</sup> <sup>1</sup> + ˜*χ*<sup>2</sup> <sup>1</sup> = ˜*χ*<sup>2</sup> <sup>2</sup> , that is, a chisquare having two degrees of freedom in the complex domain or a real gamma with the parameters *(α* = 2*, β* = 1*)*. Observe that the two chisquares are independently distributed because one of them contains only *x*˜<sup>2</sup> and the other, *x*˜<sup>1</sup> and *x*˜3. This establishes (1). In order to verify (2), we first note that the Hermitian form *v* can be expressed as follows:

$$v = \frac{1}{2}\tilde{\mathbf{x}}\_1^\*\tilde{\mathbf{x}}\_1 + \frac{(1+i)}{\sqrt{8}}\tilde{\mathbf{x}}\_1^\*\tilde{\mathbf{x}}\_3 + \frac{(1-i)}{\sqrt{8}}\tilde{\mathbf{x}}\_3^\*\tilde{\mathbf{x}}\_1 + \frac{1}{2}\tilde{\mathbf{x}}\_3^\*\tilde{\mathbf{x}}\_3$$

which can be written in the following form by making use of steps similar to those leading to *(iii)*:

$$v = \left[2\frac{(1+i)}{\sqrt{8}}\frac{\tilde{\mathbf{x}}\_1}{\sqrt{2}} + \frac{\tilde{\mathbf{x}}\_3}{\sqrt{2}}\right]^\* \left[2\frac{(1+i)}{\sqrt{8}}\frac{\tilde{\mathbf{x}}\_1}{\sqrt{2}} + \frac{\tilde{\mathbf{x}}\_3}{\sqrt{2}}\right] = \tilde{\chi}\_1^2 \tag{\text{iv}}$$

or *v* is a chisquare with one degree of freedom in the complex domain. Observe that *x*˜<sup>2</sup> is absent in *(iv)*, so that we need only compare the terms containing *x*˜<sup>1</sup> and *x*˜<sup>3</sup> in *(iii)* and *(iv)*. These terms are *<sup>y</sup>*˜<sup>2</sup> <sup>=</sup> <sup>2</sup>*(*1+*i)* √8 *<sup>x</sup>*˜<sup>1</sup> + ˜*x*<sup>3</sup> and *<sup>y</sup>*˜<sup>3</sup> <sup>=</sup> <sup>2</sup>*(*1+*i)* √8 *x*˜<sup>1</sup> − ˜*x*3. Noting that the covariance between *y*˜<sup>2</sup> and *y*˜<sup>3</sup> is zero:

$$\text{Cov}(\tilde{\mathbf{y}}\_2, \tilde{\mathbf{y}}\_3) = \left| 2 \frac{(1+i)}{\sqrt{8}} \right|^2 \text{Var}(\tilde{\mathbf{x}}\_1) - \text{Var}(\tilde{\mathbf{x}}\_3) = 1 - 1 = 0, 1$$

and that *y*˜<sup>2</sup> and *y*˜<sup>3</sup> are linear functions of normal variables and hence normal, the fact that they are uncorrelated implies that they are independently distributed. Thus, *u* and *v* are indeed independently distributed, which establishes (2).

#### **2.2.1. Extensions of the results in the real domain**

Let *xj* <sup>∼</sup> *<sup>N</sup>*1*(μj , σ*<sup>2</sup> *<sup>j</sup> ), j* <sup>=</sup> <sup>1</sup>*,... , k,* be independently distributed. Then, *xj σj* ∼ *N*1*( μj σj ,* 1*), σj >* 0*, j* = 1*,...,k*. Let

$$X = \begin{bmatrix} x\_1 \\ \vdots \\ x\_k \end{bmatrix}, \; \mu = \begin{bmatrix} \mu\_1 \\ \vdots \\ \mu\_k \end{bmatrix}, \; \Sigma = \begin{bmatrix} \sigma\_1^2 & 0 & \dots & 0 \\ 0 & \sigma\_2^2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma\_k^2 \end{bmatrix}, \; \Sigma^{\frac{1}{2}} = \begin{bmatrix} \sigma\_1 & 0 & \dots & 0 \\ 0 & \sigma\_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma\_k \end{bmatrix}.$$

Then, let

$$Y = \Sigma^{-\frac{1}{2}} X = \begin{bmatrix} \mathbb{y}\_1 \\ \vdots \\ \mathbb{y}\_k \end{bmatrix}, \ E[Y] = \Sigma^{-\frac{1}{2}} E[X] = \Sigma^{-\frac{1}{2}} \mu.$$

If *μ* = *O*, it has already been shown that *Y* - *<sup>Y</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>k</sup>* . If *μ* = *O*, then *Y* - *<sup>Y</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>k</sup> (λ), λ* <sup>=</sup> <sup>1</sup> 2*μ*- *Σ*−1*μ*. It is assumed that the noncentral chisquare distribution has already been discussed in a basic course in Statistics. It is defined for instance in Mathai and Haubold (2017a, 2017b) and will be briefly discussed in Sect. 2.3.1. If *μ* = *O*, then for any *k* × *k* symmetric matrix *A* = *A*- *, Y* - *AY* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>r</sup>* if and only if *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> and *<sup>A</sup>* is of rank *<sup>r</sup>*. This result has already been established. Now, if *μ* = *O*, then *X*- *AX* = *Y* - *Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup> *<sup>Y</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>k</sup>* if and only if *Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup> = *Σ* 1 <sup>2</sup>*AΣAΣ* 1 <sup>2</sup> ⇒ *A* = *AΣA* and *Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup> is of rank *r* or *A* is of rank r since *Σ>O*. Hence, we have the following result:

**Theorem 2.2.3.** *Let the real scalars xj* <sup>∼</sup> *<sup>N</sup>*1*(μj , σ*<sup>2</sup> *<sup>j</sup> ), j* = 1*,... , k, be independently distributed. Let*

$$X = \begin{bmatrix} \mathbf{x}\_1 \\ \vdots \\ \mathbf{x}\_k \end{bmatrix}, \ Y = \begin{bmatrix} \mathbf{y}\_1 \\ \vdots \\ \mathbf{y}\_k \end{bmatrix} = \Sigma^{-\frac{1}{2}} X, \ E(Y) = \Sigma^{-\frac{1}{2}} \boldsymbol{\mu}, \ \boldsymbol{\mu} = \begin{bmatrix} \boldsymbol{\mu}\_1 \\ \vdots \\ \boldsymbol{\mu}\_k \end{bmatrix}.$$

*Then for any k* × *k symmetric matrix A* = *A*- *,*

$$X^\prime AX = Y^\prime \Sigma^{\frac{1}{2}} A \Sigma^{\frac{1}{2}} Y \sim \begin{cases} \chi\_r^2 \text{ if } \mu = O\\ \chi\_k^2(\lambda) \text{ if } \mu \neq O, \ \lambda = \frac{1}{2} \mu^\prime \Sigma^{-\frac{1}{2}} A \Sigma^{-\frac{1}{2}} \mu \end{cases}$$

*if and only if A* = *AΣA and A is of rank r.*

Independence is not altered if the variables are relocated. Consider two quadratic forms *X*- *AX* and *X*- *BX*, *A* = *A*- *, B* = *B*- . Then, *X*- *AX* = *Y* - *Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup> *Y* and *X*- *BX* = *Y* - *Σ* 1 <sup>2</sup>*BΣ* 1 <sup>2</sup> *Y,* and we have the following result:

**Theorem 2.2.4.** *Let xj , X, Y, Σ be as defined in Theorem 2.2.3. Then, the quadratic forms X*- *AX* = *Y* - *Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup> *Y and X*- *BX* = *Y* - *Σ* 1 <sup>2</sup>*BΣ* 1 <sup>2</sup> *Y, A* = *A*- *, B* = *B*- *, are independently distributed if and only if AΣB* = *O.*

Let *X, A* and *Σ* be as defined in Theorem 2.2.3, *Z* be a standard normal vector whose components *zi, i* = 1*,...,k* are iid *N*1*(*0*,* 1*)*, and *P* be an orthonormal matrix such that *P*- *Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup>*P* = diag*(λ*1*,...,λk)*; then, a general quadratic form *X*- *AX* can be expressed as follows:

$$X^{\prime}AX = (Z^{\prime}\Sigma^{\frac{1}{2}} + \mu^{\prime})A(\Sigma^{\frac{1}{2}}Z + \mu) = (Z + \Sigma^{-\frac{1}{2}}\mu)^{\prime}P^{\prime}\Sigma^{\frac{1}{2}}A\Sigma^{\frac{1}{2}}P^{\prime}(Z + \Sigma^{-\frac{1}{2}}\mu)$$

where *λ*1*,...,λk* are the eigenvalues of *Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup> . Hence, the following decomposition of the quadratic form:

$$X^\prime AX = \lambda\_1 (u\_1 + b\_1)^2 + \dots + \lambda\_k (u\_k + b\_k)^2,\tag{2.2.1}$$
 
$$\text{where } \begin{bmatrix} b\_1 \\ \vdots \\ b\_k \end{bmatrix} = P^\prime \Sigma^{-\frac{1}{2}} \mu, \begin{bmatrix} u\_1 \\ \vdots \\ u\_k \end{bmatrix} = P^\prime \begin{bmatrix} z\_1 \\ \vdots \\ z\_k \end{bmatrix}, \text{and hence the } u\_i \text{'s are iid } N\_1(0, 1).$$

Thus, X- AX can be expressed as a linear combination of independently distributed non-central chisquare random variables, each having one degree of freedom, whose noncentrality parameters are respectively *b*<sup>2</sup> *<sup>j</sup> /*2*, j* = 1*,...,k*. Of course, the *k* chisquares will be central when *μ* = *O.*

#### **2.2a.1. Extensions of the results in the complex domain**

Let the complex scalar variables *<sup>x</sup>*˜*<sup>j</sup>* <sup>∼</sup> *<sup>N</sup>*˜1*(μ*˜ *<sup>j</sup> , σ*<sup>2</sup> *<sup>j</sup> ), j* = 1*,...,k*, be independently distributed and *<sup>Σ</sup>* <sup>=</sup> diag*(σ*<sup>2</sup> <sup>1</sup> *,...,σ*<sup>2</sup> *<sup>k</sup> )*. As well, let

$$
\tilde{X} = \begin{bmatrix} \tilde{\boldsymbol{x}}\_1 \\ \vdots \\ \tilde{\boldsymbol{x}}\_k \end{bmatrix}, \quad \boldsymbol{\Sigma} = \begin{bmatrix} \sigma\_1^2 & 0 & \dots & 0 \\ 0 & \sigma\_2^2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma\_k^2 \end{bmatrix}, \quad \boldsymbol{\Sigma}^{-\frac{1}{2}} \tilde{\boldsymbol{X}} = \tilde{\boldsymbol{Y}} = \begin{bmatrix} \tilde{\boldsymbol{y}}\_1 \\ \vdots \\ \tilde{\boldsymbol{y}}\_k \end{bmatrix}, \quad \tilde{\boldsymbol{\mu}} = \begin{bmatrix} \tilde{\mu}\_1 \\ \vdots \\ \tilde{\mu}\_k \end{bmatrix}
$$

where *Σ* 1 <sup>2</sup> is the Hermitian positive definite square root of *Σ*. In this case, *y*˜*<sup>j</sup>* ∼ *N*˜1*( μ*˜ *j σj ,* 1*), j* = 1*,...,k* and the *y*˜*<sup>j</sup>* 's are assumed to be independently distributed. Hence, for any Hermitian form *X*˜ <sup>∗</sup>*AX, A* ˜ = *A*∗, we have *X*˜ <sup>∗</sup>*AX*˜ = *Y*˜ <sup>∗</sup>*Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup> *Y*˜. Hence if *μ*˜ = *O* (null vector), then from the previous result on chisquaredness, we have:

**Theorem 2.2a.3.** *Let X, Σ,* ˜ *Y ,*˜ *μ*˜ *be as defined above. Let u* = *X*˜ <sup>∗</sup>*AX, A* ˜ = *A*<sup>∗</sup> *be a Hermitian form. Then <sup>u</sup>* ∼ ˜*χ*<sup>2</sup> *<sup>r</sup> in the complex domain if and only if A is of rank r, μ*˜ = *O and A* = *AΣA.* [*A chisquare with r degrees of freedom in the complex domain is a real gamma with parameters (α* = *r, β* = 1*).*]

If *μ*˜ = *O*, then we have a noncentral chisquare in the complex domain. A result on the independence of Hermitian forms can be obtained as well.

**Theorem 2.2a.4.** *Let X,* ˜ *Y, Σ* ˜ *be as defined above. Consider the Hermitian forms u*<sup>1</sup> = *X*˜ <sup>∗</sup>*AX, A* ˜ = *A*<sup>∗</sup> *and u*<sup>2</sup> = *X*˜ <sup>∗</sup>*BX, B* ˜ = *B*∗*. Then u*<sup>1</sup> *and u*<sup>2</sup> *are independently distributed if and only if AΣB* = *O.*

The proofs of Theorems 2.2a.3 and 2.2a.4 parallel those presented in the real case and are hence omitted.

#### **Exercises 2.2**

**2.2.1.** Give a proof to the second part of Theorem 2.2.2, namely, given that *X*- *AX, A* = *A* and *X*- *BX, B* = *B* are independently distributed where the components of the *p* × 1 vector *X* are mutually independently distributed as real standard normal variables, then show that *AB* = *O*.

**2.2.2.** Let the real scalar *xj* <sup>∼</sup> *<sup>N</sup>*1*(*0*, σ*2*), σ*<sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup>*, j* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,...,k* and be independently distributed. Let *X*- = *(x*1*,...,xk)* or *X* is the *k* × 1 vector where the elements are *x*1*,...,xk*. Then the joint density of the real scalar variables *x*1*,...,xk*, denoted by *f (X)*, is

$$f(X) = \frac{1}{(\sqrt{2\pi})^k} \mathbf{e}^{-\frac{1}{2\sigma^2}X^\prime X}, -\infty < x\_j < \infty, j = 1, \dots, k.$$

Consider the quadratic form *u* = *X*- *AX, A* = *A* and *X* is as defined above. (1): Compute the mgf of *u*; (2): Compute the density of *u* if *A* is of rank *r* and all eigenvalues of *A* are equal to *λ >* 0; (3): If the eigenvalues are *λ >* 0 for *m* of the eigenvalues and the remaining *n* of them are *λ <* 0, *m* + *n* = *r*, compute the density of *u*.

**2.2.3.** In Exercise 2.2.2 compute the density of *u* if (1): *r*<sup>1</sup> of the eigenvalues are *λ*<sup>1</sup> each and *r*<sup>2</sup> of the eigenvalues are *λ*<sup>2</sup> each, *r*<sup>1</sup> + *r*<sup>2</sup> = *r*. Consider all situations *λ*<sup>1</sup> *>* 0*, λ*<sup>2</sup> *>* 0 etc.

**2.2.4.** In Exercise 2.2.2 compute the density of *u* for the general case with no restrictions on the eigenvalues.

**2.2.5.** Let *xj* <sup>∼</sup> *<sup>N</sup>*1*(*0*, σ*2*), j* <sup>=</sup> <sup>1</sup>*,* 2 and be independently distributed. Let *<sup>X</sup>*- = *(x*1*, x*2*)*. Let *u* = *X*- *AX* where *A* = *A*- . Compute the density of *u* if the eigenvalues of *A* are (1): 2 and 1, (2): 2 and −1; (3): Construct a real 2 × 2 matrix *A* = *A* where the eigenvalues are 2 and 1.

**2.2.6.** Show that the results on chisquaredness and independence in the real or complex domain need not hold if *A* = *A*∗*, B* = *B*∗.

**2.2.7.** Construct a 2 <sup>×</sup> 2 Hermitian matrix *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>∗</sup> such that *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> and verify Theorem 2.2a.3. Construct 2 × 2 Hermitian matrices *A* and *B* such that *AB* = *O*, and then verify Theorem 2.2a.4.

**2.2.8.** Let *x*˜1*,..., x*˜*<sup>m</sup>* be a simple random sample of size *m* from a complex normal population *<sup>N</sup>*˜1*(μ*˜ <sup>1</sup>*, σ*<sup>2</sup> <sup>1</sup> *)*. Let *<sup>y</sup>*˜1*,..., <sup>y</sup>*˜*<sup>n</sup>* be iid *N (*˜ *<sup>μ</sup>*˜ <sup>2</sup>*, σ*<sup>2</sup> <sup>2</sup> *)*. Let the two complex normal populations be independent. Let

$$\begin{aligned} s\_1^2 &= \sum\_{j=1}^m (\tilde{\mathbf{x}}\_j - \bar{\tilde{\mathbf{x}}})^\* (\tilde{\mathbf{x}}\_j - \bar{\tilde{\mathbf{x}}}) / \sigma\_1^2, s\_2^2 = \sum\_{j=1}^n (\tilde{\mathbf{y}}\_j - \bar{\tilde{\mathbf{y}}})^\* (\tilde{\mathbf{y}}\_j - \bar{\tilde{\mathbf{y}}}) / \sigma\_2^2, \\ s\_{11}^2 &= \frac{1}{\sigma\_1^2} \sum\_{j=1}^m (\tilde{\mathbf{x}}\_j - \tilde{\mu}\_1)^\* (\tilde{\mathbf{x}}\_j - \tilde{\mu}\_1), s\_{21}^2 = \frac{1}{\sigma\_2^2} \sum\_{j=1}^n (\tilde{\mathbf{y}}\_j - \tilde{\mu}\_2)^\* (\tilde{\mathbf{y}}\_j - \tilde{\mu}\_2) \end{aligned}$$

Then, show that

$$\frac{s\_{11}^2/m}{s\_{21}^2/n} \sim \tilde{F}\_{m,n}, \frac{s\_1^2/(m-1)}{s\_2^2/(n-1)} \sim \tilde{F}\_{m-1,n-1}$$

for *σ*<sup>2</sup> <sup>1</sup> <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> 2 .

**2.2.9.** In Exercise 2.2.8 show that *<sup>s</sup>*<sup>2</sup> 11 *s*2 21 is a type-2 beta with the parameters *m* and *n*, and *s*2 1 *s*2 2 is a type-2 beta with the parameters *<sup>m</sup>* <sup>−</sup> 1 and *<sup>n</sup>* <sup>−</sup> 1 for *<sup>σ</sup>*<sup>2</sup> <sup>1</sup> <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> 2 .

**2.2.10.** In Exercise 2.2.8 if *σ*<sup>2</sup> <sup>1</sup> <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> then show that

$$\frac{1}{\sigma^2} \Big[ \sum\_{j=1}^m (\tilde{\boldsymbol{x}}\_j - \bar{\tilde{\boldsymbol{x}}})^\* (\tilde{\boldsymbol{x}}\_j - \bar{\tilde{\boldsymbol{x}}}) + \sum\_{j=1}^n (\tilde{\mathbf{y}}\_j - \bar{\tilde{\mathbf{y}}})^\* (\tilde{\mathbf{y}}\_j - \bar{\tilde{\mathbf{y}}}) \Big] \sim \tilde{\chi}\_{m+n-2}^2 \omega$$

**2.2.11.** In Exercise 2.2.10 if ¯ *x*˜ and ¯ *y*˜ are replaced by *μ*˜ <sup>1</sup> and *μ*˜ <sup>2</sup> respectively then show that the degrees of freedom of the chisquare is *m* + *n*.

**2.2.12.** Derive the representation of the general quadratic form X'AX given in (2.2.1).

#### **2.3. Simple Random Samples from Real Populations and Sampling Distributions**

For practical applications, an important result is that on the independence of the sample mean and sample variance when the sample comes from a normal (Gaussian) population. Let *x*1*,...,xn* be a simple random sample of size *n* from a real *N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *)* or, equivalently, *x*1*,...,xn* are iid *N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *)*. Recall that we have established that any linear function *L*- *X* = *X*- *L, L*- = *(a*1*,...,an), X*- = *(x*1*,...,xn)* remains normally distributed (Theorem 2.1.1). Now, consider two linear forms *y*<sup>1</sup> = *L*- <sup>1</sup>*X, y*<sup>2</sup> = *L*- <sup>2</sup>*X,* with *L*- <sup>1</sup> = *(a*1*,...,an), L*- <sup>2</sup> = *(b*1*,...,bn)* where *a*1*,...,an, b*1*,...,bn* are real scalar constants. Let us examine the conditions that are required for assessing the independence of the linear forms *y*<sup>1</sup> and *y*2. Since *x*1*,...,xn* are iid, we can determine the joint mgf of *x*1*,...,xn*. We take a *n*×1 parameter vector *T* , *T* - = *(t*1*,...,tn)* where the *tj* 's are scalar parameters. Then, by definition, the joint mgf is given by

$$E[\mathbf{e}^{T'X}] = \prod\_{j=1}^{n} M\_{x\_j}(t\_j) = \prod\_{j=1}^{n} \mathbf{e}^{t\_j\mu\_1 + \frac{1}{2}t\_j^2\sigma\_1^2} = \mathbf{e}^{\mu\_1 T' J + \frac{\sigma\_1^2}{2}T'T} \tag{2.3.1}$$

since the *xj* 's are iid, *J* - = *(*1*,...,* 1*)*. Since every linear function of *x*1*,...,xn* is a univariate normal, we have *y*<sup>1</sup> ∼ *N*1*(μ*1*L*- <sup>1</sup>*J, <sup>σ</sup>*<sup>2</sup> 1 *t*2 1 <sup>2</sup> *L*- <sup>1</sup>*L*1*)* and hence the mgf of *y*1, taking *<sup>t</sup>*<sup>1</sup> as the parameter for the mgf, is *My*<sup>1</sup> *(t*1*)* <sup>=</sup> <sup>e</sup>*t*1*μ*1*L*- 1*J*+*σ*<sup>2</sup> 1 *t*2 1 <sup>2</sup> *L*- <sup>1</sup>*L*<sup>1</sup> *.* Now, let us consider the joint mgf of *y*<sup>1</sup> and *y*<sup>2</sup> taking *t*<sup>1</sup> and *t*<sup>2</sup> as the respective parameters. Let the joint mgf be denoted by *My*1*,y*<sup>2</sup> *(t*1*, t*2*)*. Then,

$$M\_{\mathbb{Y}\_1,\mathbb{Y}\_2}(t\_1, t\_2) = E[\mathbf{e}^{t\_1\mathbb{Y}\_1 + t\_2\mathbb{Y}\_2}] = E[\mathbf{e}^{(t\_1L\_1' + t\_2L\_2')X}]$$

$$= \mathbf{e}^{\mu\_1(t\_1L\_1' + t\_2L\_2')J + \frac{\sigma\_1^2}{2}(t\_1L\_1 + t\_2L\_2)'(t\_1L\_1 + t\_2L\_2)}$$

$$= \mathbf{e}^{\mu\_1(L\_1' + L\_2')J + \frac{\sigma\_1^2}{2}(t\_1^2L\_1'L\_1 + t\_2^2L\_2'L\_2 + 2t\_1t\_2L\_1'L\_2)}$$

$$= M\_{\mathbb{Y}\_1}(t\_1)M\_{\mathbb{Y}\_2}(t\_2)\mathbf{e}^{\sigma\_1^2 t\_1 t\_2 L\_1' L\_2}.$$

Hence, the last factor on the right-hand side has to vanish for *y*<sup>1</sup> and *y*<sup>2</sup> to be independently distributed, and this can happen if and only if *L*- <sup>1</sup>*L*<sup>2</sup> = *L*- <sup>2</sup>*L*<sup>1</sup> = 0 since *t*<sup>1</sup> and *t*<sup>2</sup> are arbitrary. Thus, we have the following result:

**Theorem 2.3.1.** *Let x*1*,...,xn be iid N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *). Let y*<sup>1</sup> = *L*- <sup>1</sup>*X and y*<sup>2</sup> = *L*- <sup>2</sup>*X where X*- = *(x*1*,...,xn), L*- <sup>1</sup> = *(a*1*,...,an) and L*- <sup>2</sup> = *(b*1*,...,bn), the aj 's and bj 's being scalar constants. Then, y*<sup>1</sup> *and y*<sup>2</sup> *are independently distributed if and only if L*- <sup>1</sup>*L*<sup>2</sup> = *L*- <sup>2</sup>*L*<sup>1</sup> = 0*.*

**Example 2.3.1.** Let *x*1*, x*2*, x*3*, x*<sup>4</sup> be a simple random sample of size 4 from a real normal population *<sup>N</sup>*1*(μ* <sup>=</sup> <sup>1</sup>*, σ*<sup>2</sup> <sup>=</sup> <sup>2</sup>*)*. Consider the following statistics: (1): *<sup>u</sup>*1*, v*1*, w*1, (2): *u*2*, v*2*, w*2. Check for independence of various statistics in (1): and (2): where

$$\begin{aligned} u\_1 &= \bar{x} = \frac{1}{4}(\mathbf{x}\_1 + \mathbf{x}\_2 + \mathbf{x}\_3 + \mathbf{x}\_4), & v\_1 &= 2\mathbf{x}\_1 - 3\mathbf{x}\_2 + \mathbf{x}\_3 + \mathbf{x}\_4, & w\_1 &= \mathbf{x}\_1 - \mathbf{x}\_2 + \mathbf{x}\_3 - \mathbf{x}\_4; \\\ u\_2 &= \bar{x} = \frac{1}{4}(\mathbf{x}\_1 + \mathbf{x}\_2 + \mathbf{x}\_3 + \mathbf{x}\_4), & v\_2 &= \mathbf{x}\_1 - \mathbf{x}\_2 + \mathbf{x}\_3 - \mathbf{x}\_4, & w\_2 &= \mathbf{x}\_1 - \mathbf{x}\_2 - \mathbf{x}\_3 + \mathbf{x}\_4.\end{aligned}$$

**Solution 2.3.1.** Let *X*- = *(x*1*, x*2*, x*3*, x*4*)* and let the coefficient vectors in (1) be denoted by *L*1*, L*2*, L*<sup>3</sup> and those in (2) be denoted by *M*1*, M*2*, M*3. Thus they are as follows :

$$L\_1 = \frac{1}{4} \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}, \ L\_2 = \begin{bmatrix} 2 \\ -3 \\ 1 \\ 1 \end{bmatrix}, \ L\_3 = \begin{bmatrix} 1 \\ -1 \\ 1 \\ -1 \end{bmatrix} \Rightarrow L\_1' L\_2 = \frac{1}{4}, \ L\_1' L\_3 = 0, \ L\_2' L\_3 = \mathfrak{F}.$$

This means that *u*<sup>1</sup> and *w*<sup>1</sup> are independently distributed and that the other pairs are not independently distributed. The coefficient vectors in (2) are

$$M\_1 = \frac{1}{4} \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}, \ M\_2 = \begin{bmatrix} 1 \\ -1 \\ 1 \\ -1 \end{bmatrix}, \ M\_3 = \begin{bmatrix} 1 \\ -1 \\ -1 \\ 1 \end{bmatrix} \Rightarrow M\_1' M\_2 = 0, \ M\_1' M\_3 = 0, \ M\_2' M\_3 = 0.1$$

This means that all the pairs are independently distributed, that is, *u*2*, v*<sup>2</sup> and *w*<sup>2</sup> are mutually independently distributed.

We can extend Theorem 2.3.1 to sets of linear functions. Let *Y*<sup>1</sup> = *AX* and *Y*<sup>2</sup> = *BX* where *A* of dimension *m*<sup>1</sup> × *n*, *m*<sup>1</sup> ≤ *n* and *B* of dimension *m*<sup>2</sup> × *n, m*<sup>2</sup> ≤ *n* are constant matrices and *X*- <sup>=</sup> *(x*1*,...,xn)* where the *xj* 's are iid *<sup>N</sup>*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *)*. Let the parameter vectors *T*<sup>1</sup> and *T*<sup>2</sup> be of dimensions *m*<sup>1</sup> × 1 and *m*<sup>2</sup> × 1, respectively. Then, the mgf of *Y*<sup>1</sup> is *MY*<sup>1</sup> *(T*1*)* <sup>=</sup> *<sup>E</sup>*[e*<sup>T</sup>* - <sup>1</sup>*Y*<sup>1</sup> ] = *<sup>E</sup>*[e*<sup>T</sup>* - <sup>1</sup>*A*1*X*], which can be evaluated by integration over the joint density of *x*1*,...,xn*, individually, or over the vector *X*- = *(x*1*,...,xn)* with *E*[*X*- ] = [*μ*1*, μ*1*,...,μ*1] = *μ*1[1*,* 1*,...,* 1] = *μ*1*J* - *, J* - = [1*,...,* 1] ⇒ *E*[*Y*1] = *μ*1*A*1*J* . The mgf of *Y*<sup>1</sup> is then

$$M\_{Y\_1}(T\_1) = E[\mathbf{e}^{\mu\_1 T\_1' A\_1 J + T\_1' A\_1 [X - E(X)]}] = E[\mathbf{e}^{\mu\_1 T\_1' A\_1 J + T\_1' A\_1 Z}], \ Z = X - E(X), \quad \text{(i)}$$

and the exponent in the expected value, not containing *μ*1, simplifies to

$$-\frac{1}{2\sigma\_1^2} \langle Z'Z - 2\sigma\_1^2 T\_1' A\_1 Z \rangle = -\frac{1}{2\sigma\_1^2} \langle (Z' - \sigma\_1 T\_1' A\_1)(Z - \sigma\_1 A\_1' T\_1) - \sigma\_1^2 T\_1' A\_1 A\_1' T\_1 \rangle.$$

Integration over *Z* or individually over the elements of *Z*, that is, *z*1*,...,zn*, yields 1 since the total probability is 1, which leaves the factor not containing *Z*. Thus,

$$M\_{Y\_1}(T\_1) = \mathbf{e}^{\mu\_1 T\_1' A\_1 J + \frac{1}{2} T\_1' A\_1 A\_1' T\_1},\tag{ii}$$

and similarly,

$$M\_{Y\_2}(T\_2) = \mathbf{e}^{\mu\_1 T\_2' A\_2 J + \frac{1}{2} T\_2' A\_2 A\_2' T\_2}.\tag{iiii}$$

The joint mgf of *Y*<sup>1</sup> and *Y*<sup>2</sup> is then

$$M\_{Y\_1, Y\_2}(T\_1, T\_2) = \mathbf{e}^{\mu\_1(T\_1' A\_1 J + T\_2' A\_2 J) + \frac{1}{2}(T\_1' A\_1 + T\_2' A\_2)(T\_1' A\_1 + T\_2' A\_2)'} $$
 
$$= M\_{Y\_1}(T\_1) M\_{Y\_2}(T\_2) \mathbf{e}^{T\_1' A\_1 A\_2' T\_2}. \tag{i\nu}$$

Accordingly, *Y*<sup>1</sup> and *Y*<sup>2</sup> will be independent if and only if *A*1*A*- <sup>2</sup> = *O* ⇒ *A*2*A*- <sup>1</sup> = *O* since *T*<sup>1</sup> and *T*<sup>2</sup> are arbitrary parameter vectors, the two null matrices having different orders. Then, we have

**Theorem 2.3.2.** *Let Y*<sup>1</sup> = *A*1*X and Y*<sup>2</sup> = *A*2*X, with X*- = *(x*1*,...,xn), the xj 's being iid N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *), j* = 1*,...,n, be two sets of linear forms where A*<sup>1</sup> *is m*<sup>1</sup> × *n and A*<sup>2</sup> *is m*<sup>2</sup> × *n, m*<sup>1</sup> ≤ *n, m*<sup>2</sup> ≤ *n, are constant matrices. Then, Y*<sup>1</sup> *and Y*<sup>2</sup> *are independently distributed if and only if A*1*A*- <sup>2</sup> = *O or A*2*A*- <sup>1</sup> = *O.*

**Example 2.3.2.** Consider a simple random sample of size 4 from a real scalar normal population *<sup>N</sup>*1*(μ*<sup>1</sup> <sup>=</sup> <sup>0</sup>*, σ*<sup>2</sup> <sup>1</sup> = 4*)*. Let *X*- = *(x*1*, x*2*, x*3*, x*4*)*. Verify whether the sets of linear functions *U* = *A*1*X, V* = *A*2*X, W* = *A*3*X* are pairwise independent, where

$$A\_1 = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \end{bmatrix},\ A\_2 = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 2 & -1 & 1 & 3 \\ 1 & 2 & -1 & -2 \end{bmatrix},\ A\_3 = \begin{bmatrix} 1 & -1 & -1 & 1 \\ -1 & -1 & 1 & 1 \end{bmatrix}.$$

**Solution 2.3.2.** Taking the products, we have *A*1*A*- 2 = *O,A*1*A*- <sup>3</sup> = *O,A*2*A*- 3 = *O*. Hence, the pair *U* and *W* are independently distributed and other pairs are not.

We can apply Theorems 2.3.1 and 2.3.2 to prove several results involving sample statistics. For instance, let *x*1*,...,xn* be iid *N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *)* or a simple random sample of size *n* from a real *N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *)* and *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+ *xn).* Consider the vectors


Note that when the *xj* 's are iid *N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *), xj* <sup>−</sup>*μ*<sup>1</sup> <sup>∼</sup> *<sup>N</sup>*1*(*0*, σ*<sup>2</sup> <sup>1</sup> *),* and that since *X* −*X*¯ = *(X*−*μ)*−*(X*¯ <sup>−</sup>*μ),* we may take *xj* 's as coming from *<sup>N</sup>*1*(*0*, σ*<sup>2</sup> <sup>1</sup> *)* for all operations involving *(X, X)*¯ . Moreover, *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>J* - *X, J* -= *(*1*,* 1*,...,* 1*)* where *J* is a *n*×1 vector of unities. Then, Arak M. Mathai, Serge B. Provost, Hans J. Haubold

$$X - \bar{X} = \begin{bmatrix} x\_1 \\ \vdots \\ x\_n \end{bmatrix} - \begin{bmatrix} \bar{x} \\ \vdots \\ \bar{x} \end{bmatrix} = \begin{bmatrix} x\_1 - \frac{1}{\eta} J'X \\ x\_2 - \frac{1}{n} J'X \\ \vdots \\ x\_n - \frac{1}{n} J'X \end{bmatrix} = (I - \frac{1}{n} J J')X,\tag{i}$$

and on letting *<sup>A</sup>* <sup>=</sup> <sup>1</sup> *<sup>n</sup>J J* - , we have

$$A = A^2, \ I - A = (I - A)^2, \ A(I - A) = O. \tag{ii}$$

Also note that

$$(X - \bar{X})'(X - \bar{X}) = \sum\_{j=1}^{n} (x\_j - \bar{x})^2 \text{ and } s^2 = \frac{1}{n} \sum\_{j=1}^{n} (x\_j - \bar{x})^2 \tag{iii}$$

where *s*<sup>2</sup> is the sample variance and <sup>1</sup> *<sup>n</sup>J* - *X* = ¯*x* is the sample mean. Now, observe that in light of Theorem 2.3.2, *Y*<sup>1</sup> = *(I* −*A)X* and *Y*<sup>2</sup> = *AX* are independently distributed, which implies that *<sup>X</sup>* <sup>−</sup> *<sup>X</sup>*¯ and *<sup>X</sup>*¯ are independently distributed. But *<sup>X</sup>*¯ contains only *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> + · · ·+*xn)* and hence *X*−*X*¯ and *x*¯ are independently distributed. We now will make use of the following result: If *w*<sup>1</sup> and *w*<sup>2</sup> are independently distributed real scalar random variables, then the pairs *(w*1*, w*<sup>2</sup> 2*), (w*<sup>2</sup> 1*, w*2*), (w*<sup>2</sup> 1*, w*<sup>2</sup> <sup>2</sup>*)* are independently distributed when *w*<sup>1</sup> and *w*<sup>2</sup> are real scalar random variables; the converses need not be true. For example, *w*<sup>2</sup> <sup>1</sup> and *w*2 <sup>2</sup> being independently distributed need not imply the independence of *w*<sup>1</sup> and *w*2. If *w*<sup>1</sup> and *w*<sup>2</sup> are real vectors or matrices and if *w*<sup>1</sup> and *w*<sup>2</sup> are independently distributed then the following pairs are also independently distributed wherever the quantities are defined: *(w*1*, w*2*w*- 2*), (w*1*, w*- 2*w*2*), (w*1*w*- 1*, w*2*), (w*- 1*w*1*, w*2*), (w*- 1*w*1*, w*- <sup>2</sup>*w*2*)*. It then follows from *(iii)* that *x*¯ and *(X* − *X)*¯ - *(X* <sup>−</sup> *X)*¯ <sup>=</sup> *<sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(xj* − ¯*x)*<sup>2</sup> are independently distributed. Hence, the following result:

**Theorem 2.3.3.** *Let x*1*,...,xn be iid N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *) or a simple random sample of size n from a univariate real normal population N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *). Let <sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+ *xn) be the sample mean and <sup>s</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> *n <sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(xj* − ¯*x)*<sup>2</sup> *be the sample variance. Then <sup>x</sup>*¯ *and <sup>s</sup>*<sup>2</sup> *are independently distributed.*

This result has several corollaries. When *x*1*,...,xn* are iid *N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *)*, then the sample sum of products, which is also referred to as the corrected sample sum of products (corrected in the sense that *x*¯ is subtracted), is given by

$$\sum\_{j=1}^{n} (\mathbf{x}\_j - \bar{\mathbf{x}})^2 = X'(I - A)X, \ A = \frac{1}{n} \begin{bmatrix} 1 & 1 & \dots & 1 & 1 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 1 & 1 & \dots & 1 & 1 \end{bmatrix}$$

where both *<sup>A</sup>* and *<sup>I</sup>* <sup>−</sup> *<sup>A</sup>* are idempotent. In this case, tr*(A)* <sup>=</sup> <sup>1</sup> *<sup>n</sup>(*1 +···+ 1*)* = 1 and tr*(I* − *A)* = *n* − 1 and hence, the ranks of *A* and *I* − *A* are 1 and *n* − 1, respectively. When a matrix is idempotent, its eigenvalues are either zero or one, the number of ones corresponding to its rank. As has already been pointed out, when *X* and *X*¯ are involved, it can be equivalently assumed that the sample is coming from a *N*1*(*0*, σ*<sup>2</sup> <sup>1</sup> *)* population. Hence

$$\frac{\cos^2}{\sigma\_1^2} = \frac{1}{\sigma\_1^2} \sum\_{j=1}^n (\mathbf{x}\_j - \bar{\mathbf{x}})^2 \sim \chi^2\_{n-1} \tag{2.3.2}$$

is a real chisquare with *n* − 1 (the rank of the idempotent matrix of the quadratic form) degrees of freedom as per Theorem 2.2.1. Observe that when the sample comes from a real *N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *)* distribution, we have *<sup>x</sup>*¯ <sup>∼</sup> *<sup>N</sup>*1*(μ*1*, <sup>σ</sup>*<sup>2</sup> 1 *<sup>n</sup> )* so that *z* = <sup>√</sup>*n(x*¯−*μ*1*) <sup>σ</sup>*<sup>1</sup> ∼ *N*1*(*0*,* 1*)* or *<sup>z</sup>* is a real standard normal, and that *(n*−1*)s*<sup>2</sup> 1 *σ*2 1 <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>n</sup>*−<sup>1</sup> where *<sup>s</sup>*<sup>2</sup> 1 = *<sup>n</sup> <sup>j</sup>*=1*(xj*− ¯*x)*<sup>2</sup> *<sup>n</sup>*−<sup>1</sup> . Recall that *<sup>x</sup>*¯ and *<sup>s</sup>*<sup>2</sup> are independently distributed. Hence, the ratio

$$\frac{z}{s\_1/\sigma\_1} \sim t\_{n-1}$$

has a real Student-t distribution with *n* − 1 degrees of freedom, where *z* = <sup>√</sup>*n(x*¯−*μ*1*) σ*1 and *z <sup>s</sup>*1*/σ*<sup>1</sup> = <sup>√</sup>*n(x*¯−*μ*1*) <sup>s</sup>*<sup>1</sup> . Hence, we have the following result:

**Theorem 2.3.4.** *Let x*1*,...,xn be iid N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *). Let <sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+ *xn) and <sup>s</sup>*<sup>2</sup> 1 = *<sup>n</sup> <sup>j</sup>*=1*(xj*− ¯*x)*<sup>2</sup> *<sup>n</sup>*−<sup>1</sup> *. Then,* <sup>√</sup>*n(x*¯ <sup>−</sup> *<sup>μ</sup>*1*) s*1 ∼ *tn*−<sup>1</sup> (2.3.3)

*where tn*−<sup>1</sup> *is a real Student-t with n* − 1 *degrees of freedom.*

It should also be noted that when *x*1*,...,xn* are iid *N*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *)*, then

*<sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(xj* <sup>−</sup> *<sup>μ</sup>*1*)*<sup>2</sup> *σ*2 1 <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *n , <sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(xj* − ¯*x)*<sup>2</sup> *σ*2 1 <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>n</sup>*−1*,* <sup>√</sup>*n(x*¯ <sup>−</sup> *<sup>μ</sup>*1*)*<sup>2</sup> *σ*2 1 <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> 1 *,*

wherefrom the following decomposition is obtained:

$$\frac{1}{\sigma\_1^2} \sum\_{j=1}^n (\mathbf{x}\_j - \boldsymbol{\mu}\_1)^2 = \frac{1}{\sigma\_1^2} \left[ \sum\_{j=1}^n (\mathbf{x}\_j - \bar{\mathbf{x}})^2 + n(\bar{\mathbf{x}} - \boldsymbol{\mu}\_1)^2 \right] \Rightarrow \boldsymbol{\chi}\_n^2 = \boldsymbol{\chi}\_{n-1}^2 + \boldsymbol{\chi}\_1^2,\quad(2.3.4)$$

the two chisquare random variables on the right-hand side of the last equation being independently distributed.

#### **2.3a. Simple Random Samples from a Complex Gaussian Population**

The definition of a simple random sample from any population remains the same as in the real case. A set of complex scalar random variables *x*˜1*,..., x*˜*n,* which are iid as *<sup>N</sup>*˜1*(μ*˜ <sup>1</sup>*, σ*<sup>2</sup> <sup>1</sup> *)* is called a simple random sample from this complex Gaussian population. Let *X*˜ be the *n*×1 vector whose components are these sample variables, ¯ *<sup>x</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*˜<sup>1</sup> +···+ ˜*xn)* denote the sample average, and ¯ *X*˜ - = *(*¯ *x,... ,* ˜ ¯ *x)*˜ be the 1×*n* vector of sample means; then *X,* ˜ ¯ *X*˜ and the sample sum of products matrix *s*˜ are respectively,

$$
\tilde{X} = \begin{bmatrix} \tilde{x}\_1 \\ \vdots \\ \tilde{x}\_n \end{bmatrix}, \,\,\bar{\tilde{X}} = \begin{bmatrix} \bar{\tilde{X}} \\ \vdots \\ \bar{\tilde{X}} \end{bmatrix} \text{ and } \,\tilde{s} = (\tilde{X} - \bar{\tilde{X}})^\*(\tilde{X} - \bar{\tilde{X}}) \dots$$

These quantities can be simplified as follows with the help of the vector of unities *J* - = *(*1*,* 1*,...,* 1*)*: ¯ *<sup>x</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>J* - *<sup>X</sup>*˜ , *<sup>X</sup>*˜ <sup>−</sup> ¯ *<sup>X</sup>*˜ = [*<sup>I</sup>* <sup>−</sup> <sup>1</sup> *<sup>n</sup>J J* - ]*X*˜ , ¯ *<sup>X</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>J J* - *<sup>X</sup>*˜ , *<sup>s</sup>*˜ <sup>=</sup> *<sup>X</sup>*˜ ∗[*<sup>I</sup>* <sup>−</sup> <sup>1</sup> *<sup>n</sup>J J* - ]*X*˜ . Consider the Hermitian form

$$\tilde{X}^\*[I - \frac{1}{n}JJ']\tilde{X} = \sum\_{j=1}^n (\tilde{\mathfrak{x}}\_j - \bar{\tilde{\mathfrak{x}}})^\*(\tilde{\mathfrak{x}}\_j - \bar{\tilde{\mathfrak{x}}}) = n\tilde{s}^2$$

where *<sup>s</sup>*˜<sup>2</sup> is the sample variance in the complex scalar case, given a simple random sample of size *n*.

Consider the linear forms *u*˜<sup>1</sup> = *L*<sup>∗</sup> <sup>1</sup>*X*˜ = ¯*a*1*x*˜<sup>1</sup> +···+ ¯*anx*˜*<sup>n</sup>* and *u*˜<sup>2</sup> = *L*<sup>∗</sup> <sup>2</sup>*X*˜ = *b*¯ <sup>1</sup>*x*˜<sup>1</sup> + ···+ *b*¯ *nx*˜*<sup>n</sup>* where the *aj* 's and *bj* 's are scalar constants that may be real or complex, *a*¯*<sup>j</sup>* and *b*¯ *<sup>j</sup>* denoting the complex conjugates of *aj* and *bj ,* respectively.

$$E[X]' = (\tilde{\mu}\_1)[1, 1, \dots, 1] = (\tilde{\mu}\_1)J', \ J' = [1, \dots, 1],$$

since the *<sup>x</sup>*˜*<sup>j</sup>* 's are iid *<sup>N</sup>*˜1*(μ*˜ <sup>1</sup>*, σ*<sup>2</sup> <sup>1</sup> *), j* = 1*,...,n*. The mgf of *u*˜<sup>1</sup> and *u*˜2, denoted by *Mu*˜*<sup>j</sup> (t* ˜ *<sup>j</sup> ), j* = 1*,* 2 and the joint mgf of *u*˜<sup>1</sup> and *u*˜2, denoted by *Mu*˜1*,u*˜<sup>2</sup> *(t* ˜ <sup>1</sup>*, t* ˜ <sup>2</sup>*)* are the following, where *(*·*)* denotes the real part of *(*·*)*:

$$\begin{split} M\_{\tilde{\mu}\_{1}}(\tilde{t}\_{1}) &= E[\mathbf{e}^{\Re(\tilde{t}\_{1}^{\*}\tilde{\mu}\_{1})}] = E[\mathbf{e}^{\Re(\tilde{t}\_{1}^{\*}L\_{1}^{\*}\tilde{X})}] \\ &= \mathbf{e}^{\Re(\tilde{\mu}\_{1}\tilde{t}\_{1}^{\*}L\_{1}^{\*}J)} E[\mathbf{e}^{\Re(\tilde{t}\_{1}^{\*}L\_{1}^{\*}(\tilde{X}-E(\tilde{X})))} = \mathbf{e}^{\Re(\tilde{\mu}\_{1}\tilde{t}\_{1}^{\*}L\_{1}^{\*}J)} \mathbf{e}^{\frac{\sigma\_{1}^{2}}{4}\tilde{t}\_{1}^{\*}L\_{1}^{\*}L\_{1}\tilde{t}\_{1}} \qquad (i) \end{split}$$

$$M\_{\tilde{u}\_2}(\tilde{t}\_2) = \mathbf{e}^{\Re(\tilde{\mu}\_1 \tilde{t}\_2^\* L\_2^\* J) + \frac{\sigma\_1^2}{4} \tilde{t}\_2^\* L\_2^\* L\_2 \tilde{t}\_2} \tag{ii}$$

$$M\_{\tilde{u}\_1,\tilde{u}\_2}(\tilde{t}\_1,\tilde{t}\_2) = M\_{\tilde{u}\_1}(\tilde{t}\_1)M\_{\tilde{u}\_2}(\tilde{t}\_2) \text{e}^{\frac{\sigma\_\perp^2}{2}\tilde{t}\_1^\*L\_1^\*L\_2\tilde{t}\_2}. \tag{iii}$$

Consequently, *u*˜<sup>1</sup> and *u*˜<sup>2</sup> are independently distributed if and only if the exponential part is 1 or equivalently *t* ˜∗ 1*L*<sup>∗</sup> <sup>1</sup>*L*2*t* ˜ <sup>2</sup> = 0. Since *t* ˜ <sup>1</sup> and *t* ˜ <sup>2</sup> are arbitrary, this means *L*<sup>∗</sup> <sup>1</sup>*L*<sup>2</sup> = 0 ⇒ *L*∗ <sup>2</sup>*L*<sup>1</sup> = 0. Then we have the following result:

**Theorem 2.3a.1.** *Let x*˜1*,..., x*˜*<sup>n</sup> be a simple random sample of size n from a univariate complex Gaussian population <sup>N</sup>*˜1*(μ*˜ <sup>1</sup>*, σ*<sup>2</sup> <sup>1</sup> *). Consider the linear forms u*˜<sup>1</sup> = *L*<sup>∗</sup> <sup>1</sup>*X*˜ *and u*˜<sup>2</sup> = *L*∗ <sup>2</sup>*X*˜ *where L*1*, L*<sup>2</sup> *and X*˜ *are the previously defined n* × 1 *vectors, and a star denotes the conjugate transpose. Then, u*˜<sup>1</sup> *and u*˜<sup>2</sup> *are independently distributed if and only if L*<sup>∗</sup> <sup>1</sup>*L*<sup>2</sup> = 0*.*

**Example 2.3a.1.** Let *<sup>x</sup>*˜*<sup>j</sup> , j* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,* <sup>3</sup>*,* 4 be iid univariate complex normal *<sup>N</sup>*˜1*(μ*˜ <sup>1</sup>*, σ*<sup>2</sup> 1 *)*. Consider the linear forms

$$\begin{aligned} \tilde{\boldsymbol{\mu}}\_{1} &= L\_{1}^{\*} \bar{\boldsymbol{X}} = (1+i)\tilde{\boldsymbol{x}}\_{1} + 2i\tilde{\boldsymbol{x}}\_{2} - (1-i)\tilde{\boldsymbol{x}}\_{3} + 2\tilde{\boldsymbol{x}}\_{4} \\ \tilde{\boldsymbol{\mu}}\_{2} &= L\_{2}^{\*} \tilde{\boldsymbol{X}} = (1+i)\tilde{\boldsymbol{x}}\_{1} + (2+3i)\tilde{\boldsymbol{x}}\_{2} - (1-i)\tilde{\boldsymbol{x}}\_{3} - i\tilde{\boldsymbol{x}}\_{4} \\ \tilde{\boldsymbol{\mu}}\_{3} &= L\_{3}^{\*} \tilde{\boldsymbol{X}} = -(1+i)\tilde{\boldsymbol{x}}\_{1} + i\tilde{\boldsymbol{x}}\_{2} + (1-i)\tilde{\boldsymbol{x}}\_{3} + \tilde{\boldsymbol{x}}\_{4}. \end{aligned}$$

Verify whether the three linear forms are pairwise independent.

**Solution 2.3a.1.** With the usual notations, the coefficient vectors are as follows:

$$\begin{aligned} L\_1^\* &= [1+i, 2i, -1+i, 2] \Rightarrow L\_1' = [1-i, -2i, -1-i, 2] \\ L\_2^\* &= [1+i, 2+3i, 1-i, -i] \Rightarrow L\_2' = [1-i, 2-3i, 1+i, i] \\ L\_3^\* &= [-(1+i), i, 1-i, 1] \Rightarrow L\_3' = [-(1-i), -i, 1+i, 1]. \end{aligned}$$

Taking the products we have *L*∗ <sup>1</sup>*L*<sup>2</sup> = 6 + 6*i* = 0*, L*<sup>∗</sup> <sup>1</sup>*L*<sup>3</sup> = 0*, L*<sup>∗</sup> <sup>2</sup>*L*<sup>3</sup> = 3 − 3*i* = 0. Hence, only *u*˜<sup>1</sup> and *u*˜<sup>3</sup> are independently distributed.

We can extend the result stated in Theorem 2.3a.1 to sets of linear forms. Let *U*˜<sup>1</sup> = *A*1*X*˜ and *U*˜<sup>2</sup> = *A*2*X*˜ where *A*<sup>1</sup> and *A*<sup>2</sup> are constant matrices that may or may not be in

#### Arak M. Mathai, Serge B. Provost, Hans J. Haubold

the complex domain, *A*<sup>1</sup> is *m*<sup>1</sup> × *n* and *A*<sup>2</sup> is *m*<sup>2</sup> × *n,* with *m*<sup>1</sup> ≤ *n, m*<sup>2</sup> ≤ *n*. As was previously the case, *X*˜ - <sup>=</sup> *(x*˜1*,..., <sup>x</sup>*˜*n), <sup>x</sup>*˜*<sup>j</sup> , j* <sup>=</sup> <sup>1</sup>*, . . . , n,* are iid *<sup>N</sup>*˜1*(μ*˜ <sup>1</sup>*, σ*<sup>2</sup> <sup>1</sup> *)*. Let *T*˜ <sup>1</sup> and *T*˜ <sup>2</sup> be parameter vectors of orders *m*<sup>1</sup> ×1 and *m*<sup>2</sup> ×1*,* respectively. Then, on following the steps leading to *(iii)*, the mgf of *U*˜<sup>1</sup> and *U*˜<sup>2</sup> and their joint mgf are obtained as follows:

$$M\_{\tilde{U}\_{\text{I}}}(\tilde{T}\_{\text{I}}) = E[\mathbf{e}^{\Re(\tilde{T}\_{\text{I}}^{\*}A\_{1}\tilde{X})}] = \mathbf{e}^{\Re(\tilde{\mu}\_{1}\tilde{T}\_{\text{I}}^{\*}A\_{1}J) + \frac{\sigma\_{1}^{2}}{4}\tilde{T}\_{1}^{\*}A\_{1}A\_{1}^{\*}\tilde{T}\_{\text{I}}}\tag{i\nu}$$

$$M\_{\tilde{U}\_2}(\tilde{T}\_2) = \mathbf{e}^{\Re(\tilde{\mu}\_1 \tilde{T}\_2^\* A\_2 J) + \frac{\sigma\_1^2}{4} \tilde{T}\_2^\* A\_2 A\_2^\* \tilde{T}\_2} \tag{\nu}$$

$$M\_{\tilde{U}\_1,\tilde{U}\_2}(\tilde{T}\_1,\tilde{T}\_2) = M\_{\tilde{U}\_1}(\tilde{T}\_1)M\_{\tilde{U}\_2}(\tilde{T}\_2)\mathbf{e}^{\frac{\sigma\_1^2}{2}\tilde{T}\_1^\*A\_1A\_2^\*\tilde{T}\_2}.\tag{\text{vì}}$$

Since *T*˜ <sup>1</sup> and *T*˜ <sup>2</sup> are arbitrary, the exponential part in *(vi)* is 1 if and only if *A*1*A*<sup>∗</sup> <sup>2</sup> = *O* or *A*2*A*<sup>∗</sup> <sup>1</sup> = *O*, the two null matrices having different orders. Then, we have:

**Theorem 2.3a.2.** *Let the x*˜*<sup>j</sup> 's and X*˜ *be as defined in Theorem 2.3a.1. Let A*<sup>1</sup> *be a m*1×*n constant matrix and A*<sup>2</sup> *be a m*<sup>2</sup> × *n constant matrix, m*<sup>1</sup> ≤ *n, m*<sup>2</sup> ≤ *n, and the constant matrices may or may not be in the complex domain. Consider the general linear forms U*˜<sup>1</sup> = *A*1*X*˜ *and U*˜<sup>2</sup> = *A*2*X*˜ *. Then U*˜<sup>1</sup> *and U*˜<sup>2</sup> *are independently distributed if and only if A*1*A*<sup>∗</sup> <sup>2</sup> = *O or, equivalently, A*2*A*<sup>∗</sup> <sup>1</sup> = *O.*

**Example 2.3a.2.** Let *<sup>x</sup>*˜*<sup>j</sup> , j* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,* <sup>3</sup>*,* <sup>4</sup>*,* be iid univariate complex Gaussian *<sup>N</sup>*˜1*(μ*˜ <sup>1</sup>*, σ*<sup>2</sup> 1 *)*. Consider the following sets of linear forms *U*˜<sup>1</sup> = *A*1*X,* ˜ *U*˜<sup>2</sup> = *A*2*X,* ˜ *U*˜<sup>3</sup> = *A*3*X*˜ with *X*-= *(x*1*, x*2*, x*3*, x*4*)*, where

$$\begin{aligned} A\_1 &= \begin{bmatrix} 2+3i & 2+3i & 2+3i & 2+3i \\ 1+i & -(1+i) & -(1+i) & 1+i \end{bmatrix} \\ A\_2 &= \begin{bmatrix} 2i & -2i & 2i & -2i \\ 1-i & 1-i & -1+i & -1+i \\ -(1+2i) & 1+2i & -(1+2i) & 1+2i \end{bmatrix}, \ A\_3 = \begin{bmatrix} 1+2i & 0 & 1-2i & -2 \\ -2 & -1+i & -1+i & 2i \end{bmatrix}. \end{aligned}$$

Verify whether the pairs in *U*˜1*, U*˜2*, U*˜<sup>3</sup> are independently distributed.

**Solution 2.3a.2.** Since the products *A*1*A*<sup>∗</sup> <sup>2</sup> = *O,A*1*A*<sup>∗</sup> 3 = *O,A*2*A*<sup>∗</sup> 3 = *O*, only *U*˜<sup>1</sup> and *U*˜<sup>2</sup> are independently distributed.

As a corollary of Theorem 2.3a.2, one has that the sample mean ¯ *x*˜ and the sample sum of products *s*˜ are also independently distributed in the complex Gaussian case, a result parallel to the corresponding one in the real case. This can be seen by taking *<sup>A</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>J J* and *<sup>A</sup>*<sup>2</sup> <sup>=</sup> *<sup>I</sup>* <sup>−</sup> <sup>1</sup> *<sup>n</sup>J J* - . Then, since *<sup>A</sup>*<sup>1</sup> <sup>=</sup> *<sup>A</sup>*<sup>2</sup> <sup>1</sup>*, A*<sup>2</sup> <sup>=</sup> *<sup>A</sup>*<sup>2</sup> <sup>2</sup>*,* and *A*1*A*<sup>2</sup> = *O*, we have

1 *σ*2 1 *<sup>X</sup>*˜ <sup>∗</sup>*A*1*X*˜ ∼ ˜*χ*<sup>2</sup> <sup>1</sup> for *<sup>μ</sup>*˜ <sup>1</sup> <sup>=</sup> 0 and <sup>1</sup> *σ*2 1 *<sup>X</sup>*˜ <sup>∗</sup>*A*2*X*˜ ∼ ˜*χ*<sup>2</sup> *<sup>n</sup>*−1, and both of these chisquares in the complex domain are independently distributed. Then,

$$\frac{1}{\sigma\_1^2} n\tilde{s} = \frac{1}{\sigma\_1^2} \sum\_{j=1}^n (\tilde{\mathbf{x}}\_j - \bar{\tilde{\mathbf{x}}})^\* (\tilde{\mathbf{x}}\_j - \bar{\tilde{\mathbf{x}}}) \sim \tilde{\chi}\_{n-1}^2. \tag{2.3a.1}$$

The Student-t with *n* − 1 degrees of freedom can be defined in terms of the standardized sample mean and sample variance in the complex case.

#### **2.3.1. Noncentral chisquare having** *n* **degrees of freedom in the real domain**

Let *xj* <sup>∼</sup> *<sup>N</sup>*1*(μj , σ*<sup>2</sup> *<sup>j</sup> ), j* = 1*,...,n* and the *xj* 's be independently distributed. Then, *xj*−*μj σj* <sup>∼</sup> *<sup>N</sup>*1*(*0*,* <sup>1</sup>*)* and *<sup>n</sup> j*=1 *(xj*−*μj )*<sup>2</sup> *σ*2 *j* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>n</sup>* where *χ*<sup>2</sup> *<sup>n</sup>* is a real chisquare with *n* degrees of freedom. Then, when at least one of the *μj* 's is nonzero, *<sup>n</sup> j*=1 *x*2 *j σ*2 *j* is referred to as a *real non-central chisquare with n degrees of freedom and non-centrality parameter λ*, *which is denoted χ*<sup>2</sup> *<sup>n</sup> (λ)*, where

$$
\lambda = \frac{1}{2} \mu' \Sigma^{-1} \mu, \ \mu = \begin{bmatrix} \mu\_1 \\ \vdots \\ \mu\_n \end{bmatrix}, \text{ and } \ \Sigma = \begin{bmatrix} \sigma\_1^2 & 0 & \dots & 0 \\ 0 & \sigma\_2^2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma\_n^2 \end{bmatrix}.
$$

Let *<sup>u</sup>* <sup>=</sup> *<sup>n</sup> j*=1 *x*2 *j σ*2 *j* . In order to derive the distribution of *u*, let us determine its mgf. Since *u* is a function of the *xj* 's where *xj* <sup>∼</sup> *<sup>N</sup>*1*(μj , σ*<sup>2</sup> *<sup>j</sup> ), j* = 1*, . . . , n,* we can integrate out over the joint density of the *xj* 's. Then, with *t* as the mgf parameter,

$$\begin{split} M\_{\boldsymbol{u}}(t) &= E[\mathbf{e}^{\boldsymbol{t}\boldsymbol{u}}] \\ &= \int\_{-\infty}^{\infty} \dots \int\_{-\infty}^{\infty} \frac{1}{(2\pi)^{\frac{n}{2}} |\boldsymbol{\Sigma}|^{\frac{1}{2}}} \mathbf{e}^{\boldsymbol{t}\sum\_{j=1}^{n} \frac{\boldsymbol{x}\_{j}^{2}}{\sigma\_{j}^{2}} - \frac{1}{2} \sum\_{j=1}^{n} \frac{(\boldsymbol{x}\_{j} - \boldsymbol{\mu}\_{j})^{2}}{\sigma\_{j}^{2}}} \mathbf{d}\mathbf{x}\_{1} \wedge \dots \wedge \mathbf{d}\mathbf{x}\_{n} .\end{split}$$

The exponent, excluding <sup>−</sup><sup>1</sup> <sup>2</sup> can be simplified as follows:

$$-2t\sum\_{j=1}^{n} \frac{\mathbf{x}\_j^2}{\sigma\_j^2} + \sum\_{j=1}^{n} \frac{(\mathbf{x}\_j - \boldsymbol{\mu}\_j)^2}{\sigma\_j^2} = (1 - 2t) \sum\_{j=1}^{n} \frac{\mathbf{x}\_j^2}{\sigma\_j^2} - 2\sum\_{j=1}^{n} \frac{\boldsymbol{\mu}\_j \mathbf{x}\_j}{\sigma\_j^2} + \sum\_{j=1}^{n} \frac{\boldsymbol{\mu}\_j^2}{\sigma\_j^2}.$$

Let *yj* <sup>=</sup> <sup>√</sup>*(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t)xj* . Then, *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t)*−*<sup>n</sup>* <sup>2</sup> d*y*<sup>1</sup> ∧ *...* ∧ d*yn* = d*x*<sup>1</sup> ∧ *...* ∧ d*xn*, and

$$\begin{split} (1-2t)\sum\_{j=1}^{n} \left(\frac{\mathbf{x}\_{j}^{2}}{\sigma\_{j}^{2}}\right) - 2\sum\_{j=1}^{n} \left(\frac{\mu\_{j}\mathbf{x}\_{j}}{\sigma\_{j}^{2}}\right) &= \sum\_{j=1}^{n} \frac{y\_{j}^{2}}{\sigma\_{j}^{2}} - 2\sum\_{j=1}^{n} \frac{\mu\_{j}y\_{j}}{\sigma\_{j}^{2}\sqrt{(1-2t)}} \\ &+ \sum\_{j=1}^{n} \left(\frac{\mu\_{j}}{\sigma\_{j}\sqrt{(1-2t)}}\right)^{2} - \sum\_{j=1}^{n} \frac{\mu\_{j}^{2}}{\sigma\_{j}^{2}(1-2t)} \\ &= \sum\_{j=1}^{n} \left(\frac{\left(\mathbf{y}\_{j} - \frac{\mu\_{j}}{\sqrt{(1-2t)}}\right)}{\sigma\_{j}}\right)^{2} - \sum\_{j=1}^{n} \frac{\mu\_{j}^{2}}{\sigma\_{j}^{2}(1-2t)}. \end{split}$$

But

$$\int\_{-\infty}^{\infty} \cdots \int\_{-\infty}^{\infty} \frac{1}{(2\pi)^{\frac{n}{2}} |\boldsymbol{\Sigma}|^{\frac{1}{2}}} \mathrm{e}^{-\sum\_{j=1}^{n} \frac{1}{2\sigma\_{j}^{2}} \left(\mathbf{y}\_{j} - \frac{\mu\_{j}}{\sqrt{(1-2\boldsymbol{\varepsilon})}}\right)^{2}} \mathrm{d} \ \mathbf{y}\_{1} \wedge \ldots \wedge \mathbf{dy}\_{n} = 1.$$

Hence, for *<sup>λ</sup>* <sup>=</sup> <sup>1</sup> 2 *<sup>n</sup> j*=1 *μ*2 *j σ*2 *j* <sup>=</sup> <sup>1</sup> 2*μ*- *Σ*−1*μ*,

$$\begin{split} M\_{\boldsymbol{\mu}}(t) &= \frac{1}{(1-2t)^{\frac{\boldsymbol{\mu}}{2}}} [\mathbf{e}^{-\lambda + \frac{\lambda}{(1-2t)}}] \\ &= \sum\_{k=0}^{\infty} \frac{\lambda^{k} \mathbf{e}^{-\lambda}}{k!} \frac{1}{(1-2t)^{\frac{\boldsymbol{\mu}}{2}+k}}. \end{split} \tag{2.3.5}$$

However, *(*1−2*t)*−*( <sup>n</sup>* <sup>2</sup>+*k)* is the mgf of a real scalar gamma with parameters*(α* <sup>=</sup> *<sup>n</sup>* <sup>2</sup>+*k, β* = <sup>2</sup>*)* or a real chisquare with *<sup>n</sup>* <sup>+</sup> <sup>2</sup>*<sup>k</sup>* degrees of freedom or *<sup>χ</sup>*<sup>2</sup> *<sup>n</sup>*+2*k*. Hence, the density of a non-central chisquare with *n* degrees of freedom and non-centrality parameter *λ*, denoted by *gu,λ(u)*, is obtained by term by term inversion as follow:

$$g\_{u,\lambda}(u) = \sum\_{k=0}^{\infty} \frac{\lambda^k \mathbf{e}^{-\lambda}}{k!} \frac{u^{\frac{n+2k}{2}-1} \mathbf{e}^{-\frac{u}{2}}}{2^{\frac{n+2k}{2}} \Gamma(\frac{n}{2} + k)}\tag{2.3.6}$$

$$=\frac{\mu^{\frac{n}{2}-1}\mathbf{e}^{-\frac{\mu}{2}}}{2^{\frac{n}{2}}\Gamma(\frac{n}{2})}\sum\_{k=0}^{\infty}\frac{\lambda^{k}\mathbf{e}^{-\lambda}}{k!}\frac{\mu^{k}}{(\frac{n}{2})\_{k}}\tag{2.3.7}$$

where *(<sup>n</sup>* <sup>2</sup> *)k* is the Pochhammer symbol given by

$$(a)\_k = a(a+1)\cdots(a+k-1), \ a \neq 0, \ (a)\_0 \text{ being equal to } 1,\tag{2.3.8}$$

and, in general, *Γ (α* + *k)* = *Γ (α)(α)k* for *k* = 1*,* 2*,...,* whenever the gamma functions are defined. Hence, provided *(*1 − 2*t) >* 0*,* (2.3.6) can be looked upon as a weighted sum of chisquare densities whose weights are Poisson distributed, that is, (2.3.6) is a Poisson mixture of chisquare densities. As well, we can view (2.3.7) as a chisquare density having *n* degrees of freedom appended with a Bessel series. In general, a Bessel series is of the form

$$\,\_0F\_1(\cdot;b;\mathbf{x}) = \sum\_{k=0}^{\infty} \frac{1}{(b)\_k} \frac{\mathbf{x}^k}{k!}, \; b \neq 0, -1, -2, \dots,\tag{2.3.9}$$

which is convergent for all *x*.

#### **2.3.1.1. Mean value and variance, real central and non-central chisquare**

The mgf of a real *χ*<sup>2</sup> *<sup>ν</sup>* is *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t)*−*<sup>ν</sup>* <sup>2</sup> *,* 1 − 2*t >* 0. Thus,

$$\begin{aligned} E[\chi^2\_\upsilon] &= \frac{\mathbf{d}}{\mathbf{d}t} (1 - 2t)^{-\frac{\upsilon}{2}}|\_{t=0} = \left( -\frac{\upsilon}{2} \right) (-2)(1 - 2t)^{-\frac{\upsilon}{2} - 1}|\_{t=0} = \upsilon \\ E[\chi^2\_\upsilon]^2 &= \frac{\mathbf{d}^2}{\mathbf{d}t^2} (1 - 2t)^{-\frac{\upsilon}{2}}|\_{t=0} = \left( -\frac{\upsilon}{2} \right) (-2) \left( -\frac{\upsilon}{2} - 1 \right) (-2) = \upsilon(\upsilon + 2). \end{aligned}$$

That is,

$$E[\chi^2\_\nu] = \nu \text{ and } \text{Var}(\chi^2\_\nu) = \nu(\nu+2) - \nu^2 = 2\nu. \tag{2.3.10}$$

What are then the mean and the variance of a real non-central *χ*<sup>2</sup> *<sup>ν</sup> (λ)*? They can be derived either from the mgf or from the density. Making use of the density, we have

$$E[\chi^2\_\upsilon(\lambda)] = \sum\_{k=0}^\infty \frac{\lambda^k \mathbf{e}^{-\lambda}}{k!} \int\_0^\infty \mu \frac{\mu^{\frac{\upsilon}{2} + k - 1} \mathbf{e}^{-\frac{\mu}{2}}}{2^{\frac{\upsilon}{2} + k} \Gamma(\frac{\upsilon}{2} + k)} \mathrm{d}\mu,$$

the integral part being equal to

$$\frac{\Gamma(\frac{\upsilon}{2} + k + 1)}{\Gamma(\frac{\upsilon}{2} + k)} \frac{2^{\frac{\upsilon}{2} + k + 1}}{2^{\frac{\upsilon}{2} + k}} = 2\left(\frac{\upsilon}{2} + k\right) = \upsilon + 2k.$$

Now, the remaining summation over *k* can be looked upon as the expected value of *ν* + 2*k* in a Poisson distribution. In this case, we can write the expected values as the expected value of a conditional expectation: *<sup>E</sup>*[*u*] = *<sup>E</sup>*[*E(u*|*k)*]*, u* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>ν</sup> (λ)*. Thus,

$$E[\chi^2\_\nu(\lambda)] = \nu + 2\sum\_{k=0}^\infty k \frac{\lambda^k e^{-\lambda}}{k!} = \nu + 2E[k] = \nu + 2\lambda\dots$$

Moreover,

$$E\left[\chi^2\_\upsilon(\lambda)\right]^2 = \sum\_{k=0}^\infty \frac{\lambda^k \mathbf{e}^{-\lambda}}{k!} \int\_0^\infty u^2 \frac{u^{\frac{\upsilon}{2}+k-1}}{2^{\frac{\upsilon}{2}+k} \Gamma(\frac{\upsilon}{2}+k)} \mathrm{d}u,$$

the integral part being

$$\begin{aligned} \frac{\Gamma(\frac{\upsilon}{2} + k + 2)}{\Gamma(\frac{\upsilon}{2} + k)} \frac{2^{\frac{\upsilon}{2} + k + 2}}{2^{\frac{\upsilon}{2} + k}} &= 2^2(\frac{\upsilon}{2} + k + 1)(\frac{\upsilon}{2} + k) \\ &= (\upsilon + 2k + 2)(\upsilon + 2k) = \upsilon^2 + 2\upsilon k + 2\upsilon(k + 1) + 4k(k + 1). \end{aligned}$$

Since *<sup>E</sup>*[*k*] = *λ, E*[*k*2] = *<sup>λ</sup>*<sup>2</sup> <sup>+</sup> *<sup>λ</sup>* for a Poisson distribution,

$$E[\chi^2\_\nu(\lambda)]^2 = \nu^2 + 2\nu + 4\nu\lambda + 4(\lambda^2 + 2\lambda).$$

Thus,

$$\begin{split} \text{Var}(\chi^2\_\upsilon(\lambda)) &= E[\chi^2\_\upsilon(\lambda)]^2 - \left[ E(\chi^2\_\upsilon(\lambda)) \right]^2 \\ &= \nu^2 + 2\nu + 4\nu\lambda + 4(\lambda^2 + 2\lambda) - (\nu + 2\lambda)^2 \\ &= 2\nu + 8\lambda. \end{split}$$

To summarize,

$$E[\chi^2\_\nu(\lambda)] = \nu + 2\lambda \text{ and } \text{Var}(\chi^2\_\nu(\lambda)) = 2\nu + 8\lambda. \tag{2.3.11}$$

**Example 2.3.3.** Let *x*<sup>1</sup> ∼ *N*1*(*−1*,* 2*), x*<sup>2</sup> ∼ *N*1*(*1*,* 3*)* and *x*<sup>3</sup> ∼ *N*1*(*−2*,* 2*)* be independently distributed and *<sup>u</sup>* <sup>=</sup> *<sup>x</sup>*<sup>2</sup> 1 <sup>2</sup> <sup>+</sup> *<sup>x</sup>*<sup>2</sup> 2 <sup>3</sup> <sup>+</sup> *<sup>x</sup>*<sup>2</sup> 3 <sup>2</sup> . Provide explicit expressions for the density of *u*, *E*[*u*] and Var*(u)*.

**Solution 2.3.3.** This *u* has a noncentral chisquare distribution with non-centrality parameter *λ* where

$$
\lambda = \frac{1}{2} \Big[ \frac{\mu\_1^2}{\sigma\_1^2} + \frac{\mu\_2^2}{\sigma\_2^2} + \frac{\mu\_3^2}{\sigma\_3^2} \Big] = \frac{1}{2} \Big[ \frac{(-1)^2}{2} + \frac{(1)^2}{3} + \frac{(-2)^2}{2} \Big] = \frac{17}{12},
$$

and the number of degrees of freedom is *<sup>n</sup>* <sup>=</sup> <sup>3</sup> <sup>=</sup> *<sup>ν</sup>*. Thus, *<sup>u</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>3</sup> *(λ)* or a real noncentral chisquare with *<sup>ν</sup>* <sup>=</sup> 3 degrees of freedom and non-centrality parameter <sup>17</sup> <sup>12</sup> . Then *E*[*u*] = *<sup>E</sup>*[*χ*<sup>2</sup> <sup>3</sup> *(λ)*] = *<sup>ν</sup>*+2*<sup>λ</sup>* <sup>=</sup> <sup>3</sup>+2*(* <sup>17</sup> <sup>12</sup> *)* <sup>=</sup> <sup>35</sup> <sup>6</sup> . Var*(u)* <sup>=</sup> Var*(χ*<sup>2</sup> <sup>3</sup> *(λ))* <sup>=</sup> <sup>2</sup>*ν*+8*<sup>λ</sup>* <sup>=</sup> *(*2*)(*3*)*+8*(* <sup>17</sup> <sup>12</sup> *)* <sup>=</sup> <sup>52</sup> <sup>3</sup> . Let the density of *u* be denoted by *g(u)*. Then

$$\begin{split} g(\boldsymbol{\mu}) &= \frac{\boldsymbol{\mu}^{\frac{n}{2}-1} \mathbf{e}^{-\frac{\boldsymbol{\mu}}{2}}}{2^{\frac{n}{2}} \Gamma(\frac{n}{2})} \sum\_{k=0}^{\infty} \frac{\lambda^{k} \mathbf{e}^{-\lambda}}{k!} \frac{\boldsymbol{\mu}^{k}}{(\frac{n}{2})\_{k}} \\ &= \frac{\boldsymbol{\mu}^{\frac{1}{2}} \mathbf{e}^{-\frac{\boldsymbol{\mu}}{2}}}{\sqrt{2\pi}} \sum\_{k=0}^{\infty} \frac{(17/12)^{k} \mathbf{e}^{-17/12}}{k!} \frac{\boldsymbol{\mu}^{k}}{(\frac{3}{2})\_{k}}, \ 0 \leq \boldsymbol{\mu} < \infty, \end{split}$$

and zero elsewhere.

#### **2.3a.1. Noncentral chisquare having** *n* **degrees of freedom in the complex domain**

Let us now consider independently Gaussian distributed variables in the complex domain. Let the complex scalar variables *<sup>x</sup>*˜*<sup>j</sup>* <sup>∼</sup> *N (*˜ *<sup>μ</sup>*˜ *<sup>j</sup> , σ*<sup>2</sup> *<sup>j</sup> ), j* = 1*, . . . , n,* be independently distributed. Then, we have already established that *<sup>n</sup> <sup>j</sup>*=1*( x*˜*j*− ˜*μj σj )*∗*( x*˜*j*− ˜*μj σj )* ∼ ˜*χ*<sup>2</sup> *<sup>n</sup>* , which is a chisquare variable having *n* degrees of freedom in the complex domain. If we let *<sup>u</sup>*˜ <sup>=</sup> *<sup>n</sup> j*=1 *x*˜∗ *<sup>j</sup> x*˜*<sup>j</sup> σ*2 *j* , then this *u* will be said to have *a noncentral chisquare distribution with n degrees of freedom and non-centrality parameter λ in the complex domain* where *x*˜<sup>∗</sup> *<sup>j</sup>* is only the conjugate since it is a scalar quantity. Since, in this case, *u*˜ is real, we may associate the mgf of *u*˜ with a real parameter *t*. Now, proceeding as in the real case, we obtain the mgf of *u*˜, denoted by *Mu*˜*(t)*, as follows:

$$M\_{\tilde{\mu}}(t) = E[\mathbf{e}^{t\tilde{\mu}}] = (1 - t)^{-n} \mathbf{e}^{-\lambda + \frac{\lambda}{1 - t}}, \ 1 - t > 0, \ \lambda = \sum\_{j = 1}^{n} \frac{\tilde{\mu}\_j^\* \tilde{\mu}\_j}{\sigma\_j^2} = \tilde{\mu}^\* \Sigma^{-1} \tilde{\mu}$$

$$= \sum\_{k = 0}^{\infty} \frac{\lambda^k}{k!} \mathbf{e}^{-\lambda} (1 - t)^{-(n + k)}. \tag{2.3a.2}$$

Note that the inverse corresponding to *(*<sup>1</sup> <sup>−</sup> *t)*−*(n*+*k)* is a chisquare density in the complex domain with *n* + *k* degrees of freedom, and that part of the density, denoted by *f*1*(u)*, is

$$f\_1(u) = \frac{1}{\Gamma(n+k)} u^{n+k-1} \mathbf{e}^{-u} = \frac{1}{\Gamma(n)(n)\_k} u^{n-1} u^k \mathbf{e}^{-u}.$$

Thus, the noncentral chisquare density with *n* degrees of freedom in the complex domain, that is, *<sup>u</sup>* = ˜*χ*<sup>2</sup> *<sup>n</sup> (λ)*, denoted by *fu,λ(u),* is

$$f\_{\mu,\lambda}(u) = \frac{u^{n-1}}{\Gamma(n)} \mathbf{e}^{-u} \sum\_{k=0}^{\infty} \frac{\lambda^k}{k!} \mathbf{e}^{-\lambda} \frac{u^k}{(n)\_k},\tag{2.3a.3}$$

which, referring to Eqs. (2.3.5)–(2.3.9) in connection with a non-central chisquare in the real domain, can also be represented in various ways.

**Example 2.3a.3.** Let *x*˜<sup>1</sup> ∼ *N*˜1*(*1 + *i,* 2*), x*˜<sup>2</sup> ∼ *N*˜1*(*2 + *i,* 4*)* and *x*˜<sup>3</sup> ∼ *N*˜1*(*1 − *i,* 2*)* be independently distributed univariate complex Gaussian random variables and

$$
\tilde{\mu} = \frac{\tilde{\chi}\_1^\* \tilde{\chi}\_1}{\sigma\_1^2} + \frac{\tilde{\chi}\_2^\* \tilde{\chi}\_2}{\sigma\_2^2} + \frac{\tilde{\chi}\_3^\* \tilde{\chi}\_3}{\sigma\_3^2} = \frac{\tilde{\chi}\_1^\* \tilde{\chi}\_1}{2} + \frac{\tilde{\chi}\_2^\* \tilde{\chi}\_2}{4} + \frac{\tilde{\chi}\_3^\* \tilde{\chi}\_3}{2}.
$$

Compute *E*[ ˜*u*] and Var*(u)*˜ and provide an explicit representation of the density of *u*˜.

**Solution 2.3a.3.** In this case, *u*˜ has a noncentral chisquare distribution with degrees of freedom *ν* = *n* = 3 and non-centrality parameter *λ* given by

$$\lambda = \frac{\tilde{\mu}\_1^\* \tilde{\mu}\_1}{\sigma\_1^2} + \frac{\tilde{\mu}\_2^\* \tilde{\mu}\_2}{\sigma\_2^2} + \frac{\tilde{\mu}\_3^\* \tilde{\mu}\_3}{\sigma\_3^2} = \frac{[(1)^2 + (1)^2]}{2} + \frac{[(2)^2 + (1)^2]}{4} + \frac{[(1)^2 + (-1)^2]}{2} = 1 + \frac{5}{4} + 1 = \frac{13}{4}.$$

The density, denoted by *g*1*(u)*˜ , is given in *(i)*. In this case *u*˜ will be a real gamma with the parameters *(α* = *n, β* = 1*)* to which a Poisson series is appended:

$$\log\_1(\mu) = \sum\_{k=0}^{\infty} \frac{\lambda^k}{k!} \mathbf{e}^{-\lambda} \frac{\mu^{n+k-1} \mathbf{e}^{-\mu}}{\Gamma(n+k)}, \ 0 \le \mu < \infty,\tag{i}$$

and zero elsewhere. Then, the expected value of *u* and the variance of *u* are available from *(i)* by direct integration.

$$E[\boldsymbol{u}] = \int\_0^\infty \iota \varrho\_1(\boldsymbol{u}) \mathrm{d}\boldsymbol{u} = \sum\_{k=0}^\infty \frac{\lambda^k}{k!} \mathrm{e}^{-\lambda} \frac{\Gamma(n+k+1)}{\Gamma(n+k)} \mathrm{d}\boldsymbol{u}$$

But *Γ (n*+*k*+1*) Γ (n*+*k)* <sup>=</sup> *<sup>n</sup>* <sup>+</sup> *<sup>k</sup>* and the summation over *<sup>k</sup>* can be taken as the expected value of *<sup>n</sup>*+*<sup>k</sup>* in a Poisson distribution. Thus, *<sup>E</sup>*[ ˜*χ*<sup>2</sup> *<sup>n</sup> (λ)*] = *n*+*E*[*k*] = *n*+*λ*. Now, in the expected value of

$$[\tilde{\chi}\_n^2(\lambda)][\tilde{\chi}\_n^2(\lambda)]^\* = [\tilde{\chi}\_n^2(\lambda)]^2 = [\mu]^2,$$

which is real in this case, the integral part over *u* gives

$$\frac{\Gamma(n+k+2)}{\Gamma(n+k)} = (n+k+1)(n+k) = n^2 + 2nk + k^2 + n + k$$

with expected value *<sup>n</sup>*<sup>2</sup> <sup>+</sup> <sup>2</sup>*nλ* <sup>+</sup> *<sup>n</sup>* <sup>+</sup> *<sup>λ</sup>* <sup>+</sup> *(λ*<sup>2</sup> <sup>+</sup> *λ)*. Hence,

$$\text{Var}(\tilde{\chi}\_n^2(\lambda)) = E[u - E(u)][u - E(u)]^\* = \text{Var}(u) = n^2 + 2n\lambda + n + \lambda + (\lambda^2 + \lambda) - (n + \lambda)^2,$$

which simplifies to *n* + 2*λ*. Accordingly,

$$E[\tilde{\chi}\_n^2(\lambda)] = n + \lambda \text{ and } \text{Var}(\tilde{\chi}\_n^2(\lambda)) = n + 2\lambda,\tag{ii}$$

so that

$$E[u] = n + \lambda = 3 + \frac{13}{4} = \frac{25}{4} \text{ and } \text{Var}(u) = n + 2\lambda = 3 + \frac{13}{2} = \frac{19}{2}.\tag{iiii}$$

The explicit form of the density is then

$$\log g\_1(u) = \frac{u^2 \mathbf{e}^{-u}}{2} \sum\_{k=0}^{\infty} \frac{(13/4)^k \mathbf{e}^{-13/4}}{k!} \frac{u^k}{(3)\_k}, \ 0 \le u < \infty,\tag{\text{iv}}$$

and zero elsewhere.

#### **Exercises 2.3**

**2.3.1.** Let *x*1*,...,xn* be iid variables with common density a real gamma density with the parameters *<sup>α</sup>* and *<sup>β</sup>* or with the mgf *(*<sup>1</sup> <sup>−</sup> *βt)*−*α,* <sup>1</sup> <sup>−</sup> *βt >* 0, *α >* <sup>0</sup>*,β >* 0. Let *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> +···+ *xn, u*<sup>2</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+ *xn), u*<sup>3</sup> = *u*<sup>2</sup> − *αβ, u*<sup>4</sup> = <sup>√</sup>*nu*<sup>3</sup> *β* <sup>√</sup>*<sup>α</sup>* . Evaluate the mgfs and thereby the densities of *u*1*, u*2*, u*3*, u*4. Show that they are all gamma densities for all finite *n*, may be relocated. Show that when *n* → ∞*, u*<sup>4</sup> → *N*1*(*0*,* 1*)* or *u*<sup>4</sup> goes to a real standard normal when *n* goes to infinity.

**2.3.2.** Let *x*1*,...,xn* be a simple random sample of size *n* from a real population with mean value *<sup>μ</sup>* and variance *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>lt;* <sup>∞</sup>, *σ >* 0. Then the central limit theorem says that <sup>√</sup>*n(x*¯−*μ) <sup>σ</sup>* <sup>→</sup> *<sup>N</sup>*1*(*0*,* <sup>1</sup>*)* as *<sup>n</sup>* → ∞, where *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+*xn)*. Translate this statement for (1): binomial probability function *f*1*(x)* = *n x <sup>p</sup>x(*1−*p)n*−*x, x* <sup>=</sup> <sup>0</sup>*,* <sup>1</sup>*, . . . , n,* <sup>0</sup> *<p<* <sup>1</sup> *x* − 1 

and *f*1*(x)* = 0 elsewhere; (2): negative binomial probability law *f*2*(x)* = *k* − 1 *<sup>p</sup>k(*<sup>1</sup> <sup>−</sup> *p)x*−*k, x* <sup>=</sup> *k, k* <sup>+</sup> <sup>1</sup>*,...,* <sup>0</sup> *<p<* 1 and zero elsewhere; (3): geometric probability law *<sup>f</sup>*3*(x)* <sup>=</sup> *p(*<sup>1</sup> <sup>−</sup> *p)x*−1*, x* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,...,* <sup>0</sup> *<p<* 1 and *<sup>f</sup>*3*(x)* <sup>=</sup> 0 elsewhere; (4): Poisson probability law *<sup>f</sup>*4*(x)* <sup>=</sup> *<sup>λ</sup><sup>x</sup> <sup>x</sup>*! <sup>e</sup>−*λ, x* <sup>=</sup> <sup>0</sup>*,* <sup>1</sup>*,...,λ>* 0 and *<sup>f</sup>*4*(x)* <sup>=</sup> 0 elsewhere.

**2.3.3.** Repeat Exercise 2.3.2 if the population is (1): *<sup>g</sup>*1*(x)* <sup>=</sup> *<sup>c</sup>*1*x<sup>γ</sup>* <sup>−</sup>1e−*ax<sup>δ</sup> , x* ≥ 0*,δ >* <sup>0</sup>*,a >* <sup>0</sup>*,γ >* 0 and *<sup>g</sup>*1*(x)* <sup>=</sup> 0 elsewhere; (2): The real pathway model *<sup>g</sup>*2*(x)* <sup>=</sup> *cq <sup>x</sup><sup>γ</sup>* [1<sup>−</sup> *a(*1−*q)xδ*] 1 <sup>1</sup>−*<sup>q</sup> ,a >* <sup>0</sup>*,δ >* <sup>0</sup>*,* <sup>1</sup>−*a(*1−*q)x<sup>δ</sup> <sup>&</sup>gt;* 0 and for the cases *q <* <sup>1</sup>*,q >* <sup>1</sup>*, q* <sup>→</sup> 1, and *g*2*(x)* = 0 elsewhere.

**2.3.4.** Let *<sup>x</sup>* <sup>∼</sup> *<sup>N</sup>*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *), y* <sup>∼</sup> *<sup>N</sup>*1*(μ*2*, σ*<sup>2</sup> <sup>2</sup> *)* be real Gaussian and be independently distributed. Let *x*1*,...,xn*<sup>1</sup> *, y*1*,...,yn*<sup>2</sup> be simple random samples from *x* and *y* respectively. Let *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *n*<sup>1</sup> *<sup>j</sup>*=<sup>1</sup>*(xj* <sup>−</sup> *<sup>μ</sup>*1*)*2*, u*<sup>2</sup> <sup>=</sup> *n*<sup>2</sup> *<sup>j</sup>*=<sup>1</sup>*(yj* <sup>−</sup> *<sup>μ</sup>*2*)*2*, u*<sup>3</sup> <sup>=</sup> <sup>2</sup>*x*<sup>1</sup> <sup>−</sup> <sup>3</sup>*x*<sup>2</sup> <sup>+</sup> *<sup>y</sup>*<sup>1</sup> <sup>−</sup> *<sup>y</sup>*<sup>2</sup> <sup>+</sup> <sup>2</sup>*y*3,

$$\mu\_4 = \frac{\sum\_{j=1}^{n\_1} (\mathbf{x}\_j - \mu\_1)^2 / \sigma\_1^2}{\sum\_{j=1}^{n\_2} (\mathbf{y}\_j - \mu\_2)^2 / \sigma\_2^2}, \\ \mu\_5 = \frac{\sum\_{j=1}^{n\_1} (\mathbf{x}\_j - \bar{\mathbf{x}})^2 / \sigma\_1^2}{\sum\_{j=1}^{n\_2} (\mathbf{y}\_j - \bar{\mathbf{y}})^2 / \sigma\_2^2}.$$

Compute the densities of *u*1*, u*2*, u*3*, u*4*, u*5.

**2.3.5.** In Exercise 2.3.4 if *σ*<sup>2</sup> <sup>1</sup> <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> compute the densities of *<sup>u</sup>*3*, u*4*, u*<sup>5</sup> there, and *<sup>u</sup>*<sup>6</sup> <sup>=</sup> *n*<sup>1</sup> *<sup>j</sup>*=<sup>1</sup>*(xj* − ¯*x)*<sup>2</sup> <sup>+</sup> *n*<sup>2</sup> *<sup>j</sup>*=<sup>1</sup>*(yj* − ¯*y)*<sup>2</sup> if (1): *<sup>n</sup>*<sup>1</sup> <sup>=</sup> *<sup>n</sup>*2, (2): *<sup>n</sup>*<sup>1</sup> = *n*2.

**2.3.6.** For the noncentral chisquare in the complex case, discussed in (2.3a.5) evaluate the mean value and the variance.

**2.3.7.** For the complex case, starting with the mgf, derive the noncentral chisquare density and show that it agrees with that given in (2.3*a*.3).

**2.3.8.** Give the detailed proofs of the independence of linear forms and sets of linear forms in the complex Gaussian case.

#### **2.4. Distributions of Products and Ratios and Connection to Fractional Calculus**

Distributions of products and ratios of real scalar random variables are connected to numerous topics including Kratzel integrals and transforms, reaction-rate probability inte- ¨ grals in nuclear reaction-rate theory, the inverse Gaussian distribution, integrals occurring in fractional calculus, Kobayashi integrals and Bayesian structures. Let *x*<sup>1</sup> *>* 0 and *x*<sup>2</sup> *>* 0 be real scalar positive random variables that are independently distributed with density functions *f*1*(x*1*)* and *f*2*(x*2*),* respectively. We respectively denote the product and ratio of these variables by *<sup>u</sup>*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*1*x*<sup>2</sup> and *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>2</sup> *x*1 . What are then the densities of *u*<sup>1</sup> and *u*2? We first consider the density of the product. Let *<sup>u</sup>*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*1*x*<sup>2</sup> and *<sup>v</sup>* <sup>=</sup> *<sup>x</sup>*2. Then *<sup>x</sup>*<sup>1</sup> <sup>=</sup> *<sup>u</sup>*<sup>2</sup> *<sup>v</sup>* and *<sup>x</sup>*<sup>2</sup> <sup>=</sup> *<sup>v</sup>*, d*x*<sup>1</sup> <sup>∧</sup> <sup>d</sup>*x*<sup>2</sup> <sup>=</sup> <sup>1</sup> *<sup>v</sup>* d*u* ∧ d*v*. Let the joint density of *u*<sup>2</sup> and *v* be denoted by *g(u*2*, v)* and the marginal density of *u*<sup>2</sup> by *g*2*(u*2*)*. Then

$$\mathbf{g}(\boldsymbol{\mu}\_{2},\boldsymbol{v}) = \frac{1}{\boldsymbol{v}} f\_{\mathbb{I}}\left(\frac{\boldsymbol{\mu}\_{2}}{\boldsymbol{v}}\right) f\_{\mathbb{I}}(\boldsymbol{v}) \text{ and } \mathbf{g}\_{2}(\boldsymbol{\mu}\_{2}) = \int\_{\boldsymbol{v}} \frac{1}{\boldsymbol{v}} f\_{\mathbb{I}}\left(\frac{\boldsymbol{\mu}\_{2}}{\boldsymbol{v}}\right) f\_{\mathbb{I}}(\boldsymbol{v}) \mathrm{d}\boldsymbol{v}.\tag{2.4.1}$$

For example, let *f*<sup>1</sup> and *f*<sup>2</sup> be generalized gamma densities, in which case

$$f\_j(\mathbf{x}\_j) = \frac{a\_j^{\frac{\mathcal{Y}\_j}{\delta\_j}}}{\Gamma(\frac{\mathcal{Y}\_j}{\delta\_j})} \mathbf{x}\_j^{\mathcal{Y}\_j - 1} \mathbf{e}^{-a\_j \mathbf{x}\_j^{\delta\_j}}, \ a\_j > 0, \ \delta\_j > 0, \ \mathbf{y}\_j > 0, \ \mathbf{x}\_j \ge 0 \ j = 1, 2, \dots$$

and *fj (xj )* = 0 elsewhere. Then,

$$\begin{split} g\_2(u\_2) &= c \int\_0^\infty \left(\frac{1}{v}\right) \left(\frac{u\_2}{v}\right)^{\gamma\_1 - 1} v^{\gamma\_2 - 1} \\ &\quad \times \mathbf{e}^{-a\_2 v^{\delta\_2} - a\_1 \left(\frac{u\_2}{v}\right)^{\delta\_1}} \mathbf{d}v, \; c = \prod\_{j=1}^2 \frac{a\_j^{\frac{\gamma\_j}{\delta\_j}}}{\Gamma(\frac{\gamma\_j}{\delta\_j})} \\ &= c \left. u\_2^{\gamma\_1 - 1} \int\_0^\infty v^{\gamma\_2 - \gamma\_1 - 1} \mathbf{e}^{-a\_2 v^{\delta\_2} - a\_1 (u\_2^{\delta\_1}/v^{\delta\_1})} \mathbf{d}v. \end{split} \tag{2.4.2}$$

The integral in (2.4.2) is connected to several topics. For *δ*<sup>1</sup> = 1*, δ*<sup>2</sup> = 1*,* this integral is the basic Kratzel integral and Kr ¨ atzel transform, see Mathai and Haubold ( ¨ 2020). When *δ*<sup>2</sup> = <sup>1</sup>*, δ*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> *,* the integral in (2.4.2) is the basic reaction-rate probability integral in nuclear reaction-rate theory, see Mathai and Haubold (1988). For *δ*<sup>1</sup> = 1*, δ*<sup>2</sup> = 1*,* the integrand in (2.4.2), once normalized, is the inverse Gaussian density for appropriate values of *γ*<sup>2</sup> − *γ*<sup>1</sup> − 1. Observe that (2.4.2) is also connected to the Bayesian structure of unconditional densities if the conditional and marginal densities belong to generalized gamma family of densities. When *δ*<sup>2</sup> = 1*,* the integral is a mgf of the remaining part with *a*<sup>2</sup> as the mgf parameter (It is therefore the Laplace transform of the remaining part of the function).

Now, let us consider different *f*<sup>1</sup> and *f*2. Let *f*1*(x*1*)* be a real type-1 beta density with the parameters *(γ* + 1*, α)*, *(α) >* 0*, (γ ) >* −1 (in statistical problems, the parameters are real but in this case the results hold as well for complex parameters; accordingly, the conditions are stated for complex parameters), that is, the density of *x*<sup>1</sup> is

$$f\_1(\mathbf{x}\_1) = \frac{\Gamma(\gamma + 1 + \alpha)}{\Gamma(\gamma + 1)\Gamma(\alpha)} \mathbf{x}\_1^{\gamma} (1 - \mathbf{x}\_1)^{\alpha - 1}, \ 0 \le \mathbf{x}\_1 \le 1, \ \alpha > 0, \ \gamma > -1,$$

and *f*1*(x*1*)* = 0 elsewhere. Let *f*2*(x*2*)* = *f (x*2*)* where *f* is an arbitrary density. Then, the density of *u*<sup>2</sup> is given by

$$\begin{split} g\_2(u\_2) &= c \frac{1}{\Gamma(\alpha)} \int\_v \frac{1}{v} \left(\frac{u\_2}{v}\right)^{\nu} \left(1 - \frac{u\_2}{v}\right)^{\alpha - 1} f(v) \mathrm{d}v, \; c = \frac{\Gamma(\nu + 1 + \alpha)}{\Gamma(\nu + 1)} \\ &= c \frac{u\_2^{\nu}}{\Gamma(\alpha)} \int\_{v \ge u\_2 > 0} v^{-\nu - \alpha} (v - u\_2)^{\alpha - 1} f(v) \mathrm{d}v \\ &= c \, K\_{2, \mu\_2, \nu}^{-\alpha} f \end{split} \tag{2.4.3}$$

where *K*−*<sup>α</sup>* <sup>2</sup>*,u*2*,γ f* is the Erdelyi-Kober fractional integral of order ´ *α* of the second kind, with parameter *γ* in the real scalar variable case. Hence, if *f* is an arbitrary density, then *K*−*<sup>α</sup>* <sup>2</sup>*,u*2*,γ <sup>f</sup>* is *Γ (γ* <sup>+</sup>1*) Γ (γ* <sup>+</sup>1+*α)g*2*(u*2*)* or a constant multiple of the density of a product of independently distributed real scalar positive random variables where one of them has a real type-1 beta density and the other has an arbitrary density. When *f*<sup>1</sup> and *f*<sup>2</sup> are densities, then *g*2*(u*2*)* has the structure

$$\mathbf{g}\_2(\mu\_2) = \int\_{\upsilon} \frac{1}{\upsilon} f\_1(\frac{\mu\_2}{\upsilon}) f\_2(\upsilon) \mathrm{d}\upsilon. \tag{i}$$

Whether or not *f*<sup>1</sup> and *f*<sup>2</sup> are densities, the structure in *(i)* is called *the Mellin convolution of a product* in the sense that if we take the Mellin transform of *g*2, with Mellin parameter *s*, then

$$M\_{\rm g2}(\mathbf{s}) = M\_{f1}(\mathbf{s})M\_{f2}(\mathbf{s})\tag{2.4.5}$$

where *Mg*<sup>2</sup> *(s)* = # <sup>∞</sup> <sup>0</sup> *<sup>u</sup>s*−<sup>1</sup> <sup>2</sup> *g*2*(u*2*)*d*u*2,

$$M\_{f\_1}(\mathbf{s}) = \int\_0^\infty \mathbf{x}\_1^{s-1} f\_1(\mathbf{x}\_1) \mathbf{dx}\_1 \text{ and } \ M\_{f\_2}(\mathbf{s}) = \int\_0^\infty \mathbf{x}\_2^{s-1} f\_2(\mathbf{x}\_2) \mathbf{dx}\_2.$$

whenever the Mellin transforms exist. Here (2.4.5) is the Mellin convolution of a product property. In statistical terms, when *f*<sup>1</sup> and *f*<sup>2</sup> are densities and when *x*<sup>1</sup> and *x*<sup>2</sup> are independently distributed, we have

$$E[\mu\_2^{s-1}] = E[x\_1^{s-1}]E[x\_2^{s-1}] \tag{2.4.6}$$

whenever the expected values exist. Taking different forms of *f*<sup>1</sup> and *f*2, where *f*<sup>1</sup> has a factor *(*1−*x*1*)α*−<sup>1</sup> *Γ (α)* for 0 ≤ *x*<sup>1</sup> ≤ 1*, (α) >* 0*,* it can be shown that the structure appearing in (i) produces all the various fractional integrals of the second kind of order *α* available in the literature for the real scalar variable case, such as the Riemann-Liouville fractional integral, Weyl fractional integral, etc. Connections of distributions of products and ratios to fractional integrals were established in a series of papers which appeared in Linear Algebra and its Applications, see Mathai (2013, 2014, 2015).

Now, let us consider the density of a ratio. Again, let *x*<sup>1</sup> *>* 0*, x*<sup>2</sup> *>* 0 be independently distributed real scalar random variables with density functions *f*1*(x*1*)* and *f*2*(x*2*),* respectively. Let *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>2</sup> *x*1 and let *<sup>v</sup>* <sup>=</sup> *<sup>x</sup>*2. Then d*x*<sup>1</sup> <sup>∧</sup> <sup>d</sup>*x*<sup>2</sup> = − *<sup>v</sup> u*2 1 d*u*<sup>1</sup> ∧ d*v*. If we take *x*<sup>1</sup> = *v,* the Jacobian will be only *<sup>v</sup>* and not <sup>−</sup> *<sup>v</sup> u*2 1 and the final structure will be different. However, the first transformation is required in order to establish connections to fractional integral of the first kind. If *f*<sup>1</sup> and *f*<sup>2</sup> are generalized gamma densities as described earlier and if *x*<sup>1</sup> = *v*, then the marginal density of *u*1, denoted by *g*1*(u*1*)*, will be as follows:

$$\begin{split} \mathbf{g}\_{1}(\boldsymbol{u}\_{1}) &= c \int\_{\boldsymbol{v}} \boldsymbol{v} \boldsymbol{v}^{\gamma\_{1}-1} (\boldsymbol{u}\_{1} \boldsymbol{v})^{\gamma\_{2}-1} \mathbf{e}^{-a\_{1} \boldsymbol{v}^{\delta\_{1}} - a\_{2} (\boldsymbol{u}\_{1} \boldsymbol{v})^{\delta\_{2}}} \mathrm{d}\boldsymbol{v}, \ \boldsymbol{c} = \prod\_{j=1}^{2} \frac{a\_{j}^{\frac{\gamma\_{j}}{\delta\_{j}}}}{\Gamma \left( \frac{\gamma\_{j}}{\delta\_{j}} \right)}, \\ &= c \, u\_{1}^{\gamma\_{2}-1} \int\_{\boldsymbol{v}=0}^{\infty} \boldsymbol{v}^{\gamma\_{1} + \gamma\_{2} - 1} \mathbf{e}^{-a\_{1} \boldsymbol{v}^{\delta\_{1}} - a\_{2} (\boldsymbol{u}\_{1} \boldsymbol{v})^{\delta\_{2}}} \mathrm{d}\boldsymbol{v} \\ &= c \, u\_{1}^{\gamma\_{1} - 1} \mathrm{d}\boldsymbol{v}. \end{split} \tag{2.4.7}$$

$$\delta = \frac{c}{\delta} \, \Gamma \Big( \frac{\nu\_1 + \nu\_2}{\delta} \Big) u\_1^{\nu\_2 - 1} (a\_1 + a\_2 u\_1^{\delta})^{-\frac{\nu\_1 + \nu\_2}{\delta}}, \text{ for } \delta\_1 = \delta\_2 = \delta. \tag{2.4.8}$$

On the other hand, if *<sup>x</sup>*<sup>2</sup> <sup>=</sup> *<sup>v</sup>*, then the Jacobian is <sup>−</sup> *<sup>v</sup> u*2 1 and the marginal density, again denoted by *g*1*(u*1*)*, will be as follows when *f*<sup>1</sup> and *f*<sup>2</sup> are gamma densities:

$$\begin{split} g\_1(u\_1) &= c \int\_v \left( \frac{v}{u\_1^2} \right) \left( \frac{v}{u\_1} \right)^{\gamma\_1 - 1} v^{\gamma\_2 - 1} \mathbf{e}^{-a\_1(\frac{v}{u\_1})^{\delta\_1} - a\_2 v^{\delta\_2}} \mathbf{d}v \\ &= c \, u\_1^{-\gamma\_1 - 1} \int\_{v = 0}^{\infty} v^{\gamma\_1 + \gamma\_2 - 1} \mathbf{e}^{-a\_2 v^{\delta\_2} - a\_1(\frac{v}{u\_1})^{\delta\_1}} \mathbf{d}v. \end{split}$$

This is one of the representations of the density of a product discussed earlier, which is also connected to Kratzel integral, reaction-rate probability integral, and so on. Now, let ´ us consider a type-1 beta density for *x*<sup>1</sup> with the parameters *(γ , α)* having the following density:

$$f\_{\mathbf{l}}(\mathbf{x}\_{\mathbf{l}}) = \frac{\Gamma(\boldsymbol{\nu} + \boldsymbol{\alpha})}{\Gamma(\boldsymbol{\nu})\Gamma(\boldsymbol{\alpha})} \mathbf{x}\_{\mathbf{l}}^{\boldsymbol{\nu}-1} (1 - \mathbf{x}\_{\mathbf{l}})^{\boldsymbol{\alpha}-1}, \ 0 \le \boldsymbol{\nu}\_{\mathbf{l}} \le 1, \boldsymbol{\alpha}$$

for *γ >* 0*,α>* 0 and *f*1*(x*1*)* = 0 elsewhere. Let *f*2*(x*2*)* = *f (x*2*)* where *f* is an arbitrary density. Letting *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>2</sup> *x*1 and *x*<sup>2</sup> = *v*, the density of *u*1, again denoted by *g*1*(u*1*)*, is

$$g\_1(\mu\_1) = \frac{\Gamma(\nu + \alpha)}{\Gamma(\nu)\Gamma(\alpha)} \int\_v \frac{v}{u\_1^2} \left(\frac{v}{u\_1}\right)^{\nu - 1} \left(1 - \frac{v}{u\_1}\right)^{\alpha - 1} f(v) \mathrm{d}v$$

$$= \frac{\Gamma(\nu + \alpha)}{\Gamma(\nu)} \frac{u\_1^{-\nu - \alpha}}{\Gamma(\alpha)} \int\_{v \le u\_1} v^{\nu} (u\_1 - v)^{\alpha - 1} f(v) \mathrm{d}v \tag{2.4.9}$$

$$= \frac{\Gamma(\nu + \alpha)}{\Gamma(\nu)} K\_{1, u, \nu}^{-\alpha} f, \ \Re(\alpha) > 0,\tag{2.4.10}$$

$$\rho = \frac{\Gamma(\chi + \alpha)}{\Gamma(\chi)} K\_{1, u\_{1, \chi}}^{-\alpha} f,\ \Re(\alpha) > 0,\tag{2.4.10}$$

where *K*−*<sup>α</sup>* <sup>1</sup>*,u*1*,γ f* is Erdelyi-Kober fractional integral of the first kind of order ´ *α* and parameter *γ* . If *f*<sup>1</sup> and *f*<sup>2</sup> are densities, this Erdelyi-Kober fractional integral of the first kind is ´ a constant multiple of the density of a ratio *g*1*(u*1*)*. In statistical terms,

$$u\_1 = \frac{x\_2}{x\_1} \Rightarrow E[u\_1^{s-1}] = E[\mathbf{x}\_2^{s-1}]E[\mathbf{x}\_1^{-s+1}] \text{ with } E[\mathbf{x}\_1^{-s+1}] = E[\mathbf{x}\_1^{(2-s)-1}] \Rightarrow$$

$$M\_{\mathbf{g}\_1}(\mathbf{s}) = M\_{f\_1}(2-s)M\_{f\_2}(\mathbf{s}),\tag{2.4.11}$$

which is the Mellin convolution of a ratio. Whether or not *f*<sup>1</sup> and *f*<sup>2</sup> are densities, (2.4.11) is taken as the Mellin convolution of a ratio and it cannot be given statistical interpretations when *<sup>f</sup>*<sup>1</sup> and *<sup>f</sup>*<sup>2</sup> are not densities. For example, let *<sup>f</sup>*1*(x*1*)* <sup>=</sup> *<sup>x</sup>*−*<sup>α</sup>* 1 *(*1−*x*1*)α*−<sup>1</sup> *Γ (α)* and *f*2*(x*2*)* = *xα* <sup>2</sup> *f (x*2*)* where *f (x*2*)* is an arbitrary function. Then the Mellin convolution of a ratio, as in (2.4.11), again denoted by *g*1*(u*1*)*, is given by

$$\lg\_1(u\_1) = \int\_{v \le u\_1} \frac{(u\_1 - v)^{\alpha - 1}}{\Gamma(\alpha)} f(v) \text{d}v,\ \Re(\alpha) > 0. \tag{2.4.12}$$

This is *Riemann-Liouville fractional integral of the first kind of order α* if *v* is bounded below; when *v* is not bounded below, then it is *Weyl fractional integral of the first kind of order α*. An introduction to fractional calculus is presented in Mathai and Haubold (2018). The densities of *u*<sup>1</sup> and *u*<sup>2</sup> are connected to various problems in different areas for different functions *f*<sup>1</sup> and *f*2.

In the *p* × *p* matrix case in the complex domain, we will assume that the matrix is Hermitian positive definite. Note that when *p* = 1*,* Hermitian positive definite means a real positive variable. Hence in the scalar case, we will not discuss ratios and products in the complex domain since densities must be real-valued functions.

#### **Exercises 2.4**

**2.4.1.** Derive the density of (1): a real non-central F, where the numerator chisquare is non-central and the denominator chisquare is central, (2): a real doubly non-central F where both the chisquares are non-central with non-centrality parameters *λ*<sup>1</sup> and *λ*<sup>2</sup> respectively.

**2.4.2.** Let *x*<sup>1</sup> and *x*<sup>2</sup> be real gamma random variables with parameters *(α*1*,β)* and *(α*2*,β)* with the same *<sup>β</sup>* respectively and be independently distributed. Let *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *x*1+*x*<sup>2</sup> *, u*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *x*2 *, u*<sup>3</sup> = *x*<sup>1</sup> + *x*2. Compute the densities of *u*1*, u*2*, u*3. Hint: Use the transformation *x*<sup>1</sup> = *<sup>r</sup>* cos2 *θ,x*<sup>2</sup> <sup>=</sup> *<sup>r</sup>* sin2 *<sup>θ</sup>*.

**2.4.3.** Let *x*<sup>1</sup> and *x*<sup>2</sup> be as defined as in Exercise 2.4.2. Let *u* = *x*1*x*2. Derive the density of *u*.

**2.4.4.** Let *xj* have a real type-1 beta density with the parameters *(αj , βj ), j* = 1*,* 2 and be independently distributed. Let *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*1*x*2*, u*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *x*2 . Derive the densities of *u*<sup>1</sup> and *u*2. State the conditions under which these densities reduce to simpler known densities.

**2.4.5.** Evaluate (1): Weyl fractional integral of the second kind of order *α* if the arbitrary function is *f (v)* <sup>=</sup> <sup>e</sup>−*v*; (ii) Riemann-Liouville fractional integral of the first kind of order *<sup>α</sup>* if the lower limit is 0 and the arbitrary function is *f (v)* <sup>=</sup> *<sup>v</sup>δ*.

**2.4.6.** In Exercise 2.4.2 show that (1): *u*<sup>1</sup> and *u*<sup>3</sup> are independently distributed; (2): *u*<sup>2</sup> and *u*<sup>3</sup> are independently distributed.

**2.4.7.** In Exercise 2.4.2 show that for arbitrary *h*, *E x*<sup>1</sup> *x*1+*x*<sup>2</sup> *h* <sup>=</sup> *E(x<sup>h</sup>* 1 *) E(x*1+*x*2*)<sup>h</sup>* and state the conditions for the existence of the moments. [Observe that, in general, *E( <sup>y</sup>*<sup>1</sup> *y*2 *)h* <sup>=</sup> *E(y<sup>h</sup>* 1 *) E(y<sup>h</sup>* 2 *)* even if *y*<sup>1</sup> and *y*<sup>2</sup> are independently distributed.]

**2.4.8.** Derive the corresponding densities in Exercise 2.4.1 for the complex domain by taking the chisquares in the complex domain.

**2.4.9.** Extend the results in Exercise 2.4.2 to the complex domain by taking chisquare variables in the complex domain instead of gamma variables.

#### **2.5. General Structures**

#### **2.5.1. Product of real scalar gamma variables**

Let *x*1*,...,xk* be independently distributed real scalar gamma random variables with *xj* having the density *fj (xj )* = *cjx αj*−1 *<sup>j</sup>* e − *xj βj ,* 0 ≤ *xj <* ∞*, αj >* 0*, βj >* 0 and *fj (xj )* = 0 elsewhere. Consider the product *u* = *x*1*x*<sup>2</sup> ··· *xk*. Such structures appear in many situations such as geometrical probability problems when we consider gamma distributed random points, see Mathai (1999). How can we determine the density of such a general structure? The transformation of variables technique is not a feasible procedure in this case. Since the *xj* 's are positive, we may determine the Mellin transforms of the *xj* 's with parameter *s*. Then, when *fj (xj )* is a density, the Mellin transform *Mfj (s)*, once expressed in terms of an expected value, is *Mfj (s)* <sup>=</sup> *<sup>E</sup>*[*xs*−<sup>1</sup> *<sup>j</sup>* ] whenever the expected value exists:

$$\begin{split} M\_{f\_j}(\mathbf{s}) &= E[\mathbf{x}\_j^{s-1}] = \frac{1}{\beta\_j^{\alpha\_j} \Gamma(\beta\_j)} \int\_0^\infty \mathbf{x}\_j^{s-1} \mathbf{x}\_j^{\alpha\_j - 1} \mathbf{e}^{-\frac{x\_j}{\beta\_j}} \mathbf{d} x\_j \\ &= \frac{\Gamma(\alpha\_j + s - 1)}{\Gamma(\alpha\_j)} \theta\_j^{s-1}, \ \Re(\alpha\_j + s - 1) > 0. \end{split}$$

Hence,

$$E[\boldsymbol{u}^{s-1}] = \prod\_{j=1}^{k} E[\boldsymbol{x}\_j^{s-1}] = \prod\_{j=1}^{k} \frac{\Gamma(\boldsymbol{\alpha}\_j + \boldsymbol{s} - 1)}{\Gamma(\boldsymbol{\alpha}\_j)} \boldsymbol{\beta}\_j^{s-1}, \ \Re(\boldsymbol{\alpha}\_j + \boldsymbol{s} - 1) > 0, \ j = 1, \ldots, k, \ \boldsymbol{s}$$

and the density of *u* is available from the inverse Mellin transform. If *g(u)* is the density of *u*, then

$$g(u) = \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} \left\{ \prod\_{j=1}^{k} \frac{\Gamma(\alpha\_j + s - 1)}{\Gamma(\alpha\_j)} \beta\_j^{s-1} \right\} u^{-s} \mathrm{d}x, i = \sqrt{(-1)}.\tag{2.5.1}$$

This is a contour integral where *c* is any real number such that *c >* −*(αj* − 1*), j* = 1*,...,k*. The integral in (2.5.1) is available in terms of a known special function, namely Meijer's G-function. The G-function can be defined as follows:

$$\begin{split} G(z) &= G\_{p,q}^{m,n}(z) = G\_{p,q}^{m,n} \left[ z \vert\_{b\_1, \dots, b\_q}^{a\_1, \dots, a\_p} \right] \\ &= \frac{1}{2\pi i} \int\_L \frac{\{\prod\_{j=1}^m \Gamma(b\_j + s)\} \{\prod\_{j=1}^n \Gamma(1 - a\_j - s)\}}{\{\prod\_{j=m+1}^q \Gamma(1 - b\_j - s)\} \{\prod\_{j=n+1}^p \Gamma(a\_j + s)\}} z^{-s} \mathrm{d}s, \; i = \sqrt{(-1)}. \end{split} \tag{2.5.2}$$

The existence conditions, different possible contours *L*, as well as properties and applications are discussed in Mathai (1993), Mathai and Saxena (1973, 1978), and Mathai et al. (2010). With the help of (2.5.2), we may now express (2.5.1) as follows in terms of a G-function:

$$\log(u) = \left\{ \prod\_{j=1}^{k} \frac{1}{\beta\_j \Gamma(\alpha\_j)} \right\} G\_{0,k}^{k,0} \left[ \frac{u}{\beta\_1 \cdots \beta\_k} \Big|\_{\alpha\_j = 1, j = 1, \ldots, k} \right] \tag{2.5.3}$$

for 0 ≤ *u <* ∞. Series and computable forms of a general G-function are provided in Mathai (1993). They are built-in functions in the symbolic computational packages Mathematicaand MAPLE.

#### **2.5.2. Product of real scalar type-1 beta variables**

Let *y*1*,...,yk* be independently distributed real scalar type-1 beta random variables with the parameters *(αj , βj ), αj >* 0*, βj >* 0*, j* = 1*,...,k*. Consider the product *u*<sup>1</sup> = *y*<sup>1</sup> ··· *yk*. Such a structure occurs in several contexts. It appears for instance in geometrical probability problems in connection with type-1 beta distributed random points. As well, when testing certain hypotheses on the parameters of one or more multivariate normal populations, the resulting likelihood ratio criteria, also known as *λ*-criteria, or one- to-one functions thereof, have the structure of a product of independently distributed real type-1 beta variables under the null hypothesis. The density of *u*<sup>1</sup> can be obtained by proceeding as in the previous section. Since the moment of *u*<sup>1</sup> of order *s* − 1 is

$$\begin{aligned} \mathbb{E}\{\mathbf{u}\_1^{s-1}\} &= \prod\_{j=1}^k E\{\mathbf{y}\_j^{s-1}\} \\ &= \prod\_{j=1}^k \frac{\Gamma(\alpha\_j + s - 1)}{\Gamma(\alpha\_j)} \frac{\Gamma(\alpha\_j + \beta\_j)}{\Gamma(\alpha\_j + \beta\_j + s - 1)} \end{aligned} \tag{2.5.4}$$

for *(αj* + *s* − 1*) >* 0*, j* = 1*,...,k*, Then, the density of *u*1, denoted by *g*1*(u*1*)*, is given by

$$\begin{split} g\_{1}(\boldsymbol{u}\_{1}) &= \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} [E(\boldsymbol{u}\_{1}^{s-1})] \boldsymbol{u}\_{1}^{-s} \mathrm{d}s \\ &= \left\{ \prod\_{j=1}^{k} \frac{\Gamma(\boldsymbol{\alpha}\_{j} + \boldsymbol{\beta}\_{j})}{\Gamma(\boldsymbol{\alpha}\_{j})} \right\} \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} \left\{ \prod\_{j=1}^{k} \frac{\Gamma(\boldsymbol{\alpha}\_{j} + \boldsymbol{s} - 1)}{\Gamma(\boldsymbol{\alpha}\_{j} + \boldsymbol{\beta}\_{j} + \boldsymbol{s} - 1)} \right\} \boldsymbol{u}\_{1}^{-s} \mathrm{d}s \\ &= \left\{ \prod\_{j=1}^{k} \frac{\Gamma(\boldsymbol{\alpha}\_{j} + \boldsymbol{\beta}\_{j})}{\Gamma(\boldsymbol{\alpha}\_{j})} \right\} G\_{k,k}^{k,0} \left[ \boldsymbol{u}\_{1}|\_{\boldsymbol{\alpha}\_{j} - 1, \; j = 1, \ldots, k} \right] \end{split} \tag{2.5.5}$$

for 0 ≤ *u*<sup>1</sup> ≤ 1*, (αj* + *s* − 1*) >* 0*, j* = 1*,...,k*.

#### **2.5.3. Product of real scalar type-2 beta variables**

Let *u*<sup>2</sup> = *z*1*z*<sup>2</sup> ··· *zk* where the *zj* 's are independently distributed real scalar type-2 beta random variables with the parameters *(αj , βj ), αj >* 0*, βj >* 0*, j* = 1*,...,k*. Such products are encountered in several situations, including certain problems in geometrical probability that are discussed in Mathai (1999). Then,

$$E[z\_j^{s-1}] = \frac{\Gamma(\alpha\_j + s - 1)}{\Gamma(\alpha\_j)} \frac{\Gamma(\beta\_j - s + 1)}{\Gamma(\beta\_j)}, \ -\Re(\alpha\_j - 1) < \Re(s) < \Re(\beta\_j + 1),$$

and

$$E[\boldsymbol{u}\_2^{s-1}] = \prod\_{j=1}^k E[\boldsymbol{z}\_j^{s-1}] = \left\{ \prod\_{j=1}^k [\boldsymbol{\varGamma}(\boldsymbol{a}\_j)\boldsymbol{\varGamma}(\beta\_j)]^{-1} \right\} \left\{ \prod\_{j=1}^k \boldsymbol{\varGamma}(\boldsymbol{a}\_j + \boldsymbol{s} - 1)\boldsymbol{\varGamma}(\beta\_j - \boldsymbol{s} + 1) \right\}.\tag{2.5.6}$$

Hence, the density of *u*2, denoted by *g*2*(u*2*)*, is given by

$$\begin{split} g\_{2}(\boldsymbol{u}\_{2}) &= \left\{ \prod\_{j=1}^{k} [\varGamma(\boldsymbol{\alpha}\_{j})\varGamma(\beta\_{j})]^{-1} \right\} \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} \left\{ \prod\_{j=1}^{k} \varGamma(\boldsymbol{\alpha}\_{j} + \boldsymbol{s} - 1) \varGamma(\beta\_{j} - \boldsymbol{s} + 1) \right\} \boldsymbol{u}\_{2}^{-s} d\boldsymbol{s} \\ &= \left\{ \prod\_{j=1}^{k} [\varGamma(\boldsymbol{\alpha}\_{j})\varGamma(\beta\_{j})]^{-1} \right\} G\_{k,k}^{k,k} \left[ \boldsymbol{u}\_{2} \big|\_{\boldsymbol{\alpha}\_{j}=1,\,\,j=1,\ldots,k}^{-\beta\_{j},\,\,j=1,\ldots,k} \right], \; \boldsymbol{u}\_{2} \geq 0. \end{split} \tag{2.5.7}$$

#### **2.5.4. General products and ratios**

Let us consider a structure of the following form:

$$\mu\_3 = \frac{t\_1 \cdots t\_r}{t\_{r+1} \cdots t\_k}$$

where the *tj* 's are independently distributed real positive variables, such as real type-1 beta, real type-2 beta, and real gamma variables, where the expected values *E*[*t s*−1 *<sup>j</sup>* ] for *j* = 1*,... , k,* will produce various types of gamma products, some containing +*s* and others, −*s*, both in the numerator and in the denominator. Accordingly, we obtain a general structure such as that appearing in (2.5.2), and the density of *u*3, denoted by *g*3*(u*3*)*, will then be proportional to a general G-function.

#### **2.5.5. The H-function**

Let *u* = *v*1*v*<sup>2</sup> ··· *vk* where the *vj* 's are independently distributed generalized real gamma variables with densities

$$h\_j(v\_j) = \frac{a\_j^{\frac{\mathcal{V}\_j}{\delta\_j}}}{\Gamma(\frac{\mathcal{V}\_j}{\delta\_j})} v\_j^{\mathcal{V}\_j - 1} \mathbf{e}^{-a\_j v\_j^{\delta\_j}}, \ v\_j \ge 0,$$

for *aj >* 0*, δj >* 0*, γj >* 0*,* and *hj (vj )* = 0 elsewhere for *j* = 1*,...,k*. Then,

$$E[v\_j^{s-1}] = \frac{\Gamma(\frac{\nu\_j + s - 1}{\delta\_j})}{\Gamma(\frac{\mathcal{V}\_j}{\delta\_j})} \frac{1}{a\_j^{\frac{s-1}{\delta\_j}}},$$

for *(γj* + *s* − 1*) >* 0*, vj* ≥ 0*, δj >* 0*, aj >* 0*, γj >* 0*, j* = 1*,...,k*, and

$$E[\boldsymbol{u}^{s-1}] = \left\{ \prod\_{j=1}^{k} \frac{a\_j^{\frac{1}{\delta\_j}}}{\Gamma(\frac{\mathcal{V}\_j}{\delta\_j})} \right\} \Big| \prod\_{j=1}^{k} \Gamma(\frac{\mathcal{V}\_j - 1}{\delta\_j} + \frac{s}{\delta\_j}) a\_j^{-\frac{s}{\delta\_j}} \Big|. \tag{2.5.8}$$

Thus, the density of *u*, denoted by *g*3*(u)*, is given by

$$\begin{split} g\_{3}(\boldsymbol{u}) &= \left\{ \prod\_{j=1}^{k} \frac{a\_{j}^{\frac{1}{\delta\_{j}}}}{\Gamma(\frac{\boldsymbol{Y}\_{j}}{\delta\_{j}})} \right\} \frac{1}{2\pi i} \int\_{L} \left\{ \prod\_{j=1}^{k} \Gamma(\frac{\boldsymbol{Y}\_{j} - 1}{\delta\_{j}} + \frac{s}{\delta\_{j}}) a\_{j}^{-\frac{s}{\delta\_{j}}} \right\} \boldsymbol{u}^{-s} \boldsymbol{d} \boldsymbol{s} \\ &= \left\{ \prod\_{j=1}^{k} \frac{a\_{j}^{\frac{1}{\delta\_{j}}}}{\Gamma(\frac{\boldsymbol{Y}\_{j}}{\delta\_{j}})} \right\} H\_{0,k}^{k,0} \left[ \left\{ \prod\_{j=1}^{k} a\_{j}^{\frac{1}{\delta\_{j}}} \right\} \boldsymbol{u} \Big|\_{(\frac{\boldsymbol{Y}\_{j}-1}{\delta\_{j}}, \frac{1}{\delta\_{j}}), \; j = 1, \ldots, k \right] \end{split} \tag{2.5.9}$$

for *u* ≥ 0*, (γj* +*s* −1*) >* 0*, j* = 1*,... , k,* where *L* is a suitable contour and the general H-function is defined as follows:

$$\begin{split} H(z) &= H\_{p,q}^{m,n}(z) = H\_{p,q}^{m,n} \Big[ \mathbb{Z} \Big| \begin{matrix} (a\_1, a\_1), \dots, (a\_p, a\_p) \\ (b\_1, \beta\_1), \dots, (b\_q, \beta\_q) \end{matrix} \Big] \\ &= \frac{1}{2\pi i} \int\_L \frac{\{\prod\_{j=1}^m \Gamma(b\_j + \beta\_j s)\} \{\prod\_{j=1}^n \Gamma(1 - a\_j - a\_j s)\}}{\{\prod\_{j=m+1}^q \Gamma(1 - b\_j - \beta\_j s)\} \{\prod\_{j=n+1}^p \Gamma(a\_j + a\_j s)\}} z^{-s} \text{ds} \qquad (2.5.10) \end{matrix} \end{split}$$

where *αj >* 0*, j* = 1*,...,p*; *βj >* 0*, j* = 1*,...,q* are real and positive, *bj* 's and *aj* 's are complex numbers, the contour *L* separates the poles of *Γ (bj* + *βj s), j* = 1*, . . . , m,* lying on one side of it and the poles of *Γ (*1 − *aj* − *αj s), j* = 1*, . . . , n,* which must lie on the other side. The existence conditions and the various types of possible contours are discussed in Mathai and Saxena (1978) and Mathai et al. (2010). Observe that we can consider arbitrary powers of the variables present in *u, u*1*, u*<sup>2</sup> and *u*<sup>3</sup> as introduced in Sects. 2.5.1–2.5.5; however, in this case, the densities of these various structures will be expressible in terms of H-functions rather than G-functions. In the G-function format as defined in (2.5.2), the complex variable *s* has ±1 as its coefficients, whereas the coefficients of *s* in the H-function, that is, ±*αj , αj >* 0 and ±*βj , βj >* 0, are not restricted to unities.

We will give a simple illustrative example that requires the evaluation of an inverse Mellin transform. Let *f (x)* <sup>=</sup> <sup>e</sup>−*x,x>* 0. Then, the Mellin transform is

$$M\_f(\mathbf{s}) = \int\_0^\infty \mathbf{x}^{s-1} \mathbf{e}^{-\mathbf{x}} d\mathbf{x} = \boldsymbol{\varGamma}(\mathbf{s}), \ \mathfrak{R}(\mathbf{s}) > \mathbf{0},$$

and it follows from the inversion formula that

$$f(\mathbf{x}) = \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} \Gamma(\mathbf{s}) \mathbf{x}^{-s} \mathbf{d}\mathbf{s}, \ \mathfrak{M}(\mathbf{s}) > 0, \ i = \sqrt{(-1)}.\tag{2.5.11}$$

If *f (x)* is unknown and we are told that the Mellin transform of a certain function is *Γ (s)*, then are we going to retrieve *f (x)* as e−*<sup>x</sup>* from the inversion formula? Let us explore this problem. The poles of *Γ (s)* occur at *s* = 0*,* −1*,* −2*,...*. Thus, if we take *c* in the contour of integration as *c >* 0, this contour will enclose all the poles of *Γ (s)*. We may now apply Cauchy's residue theorem. By definition, the residue at *s* = −*ν*, denoted by *Rν*, is

$$R\_{\nu} = \lim\_{s \to -\nu} (s+\nu)F(s)x^{-s}.$$

We cannot substitute *s* = −*ν* to obtain the limit in this case. However, noting that

$$\Gamma(\mathbf{s}+\boldsymbol{\nu})\Gamma(\mathbf{s})\mathbf{x}^{-s} = \frac{(\mathbf{s}+\boldsymbol{\nu})(\mathbf{s}+\boldsymbol{\nu}-1)\cdots\mathbf{s}\,\Gamma(\mathbf{s})\mathbf{x}^{-s}}{(\mathbf{s}+\boldsymbol{\nu}-1)\cdots\mathbf{s}} = \frac{\Gamma(\mathbf{s}+\boldsymbol{\nu}+1)\mathbf{x}^{-s}}{(\mathbf{s}+\boldsymbol{\nu}-1)\cdots\mathbf{s}},$$

which follows from the recursive relationship, *αΓ (α)* = *Γ (α* + 1*)*, the limit can be taken:

$$\begin{split} \lim\_{s \to -\nu} (s+\nu)\Gamma(s)\ge^{-s} &= \lim\_{s \to -\nu} \frac{\Gamma(s+\nu+1)\ge^{-s}}{(s+\nu-1)\cdot \cdots s} \\ &= \frac{\Gamma(1)\ge^{\nu}}{(-1)(-2)\cdot \cdots (-\nu)} = \frac{(-1)^{\nu}\ge^{\nu}}{\nu!}. \end{split} \tag{2.5.12}$$

Hence, the sum of the residues is

$$\sum\_{\nu=0}^{\infty} R\_{\nu} = \sum\_{\nu=0}^{\infty} \frac{(-1)^{\nu} x^{\nu}}{\nu!} = \mathbf{e}^{-\mathbf{x}},$$

and the function is recovered.

**Note 2.5.1.** Distributions of products and ratios of random variables in the complex domain could as well be worked out. However, since they may not necessarily have practical applications, they will not be discussed herein. Certain product and ratio distributions for variables in the complex domain which reduce to real variables, such as a chisquare in the complex domain, have already been previously discussed.

#### **Exercises 2.5**

**2.5.1.** Evaluate the density of *u* = *x*1*x*<sup>2</sup> where the *xj* 's are independently distributed real type-1 beta random variables with the parameters *(αj , βj ), αj >* 0*, βj >* 0*, j* = 1*,* 2 by using Mellin and inverse Mellin transform technique. Evaluate the density for the case *α*<sup>1</sup> − *α*<sup>2</sup> = ±*ν, ν* = 0*,* 1*, ...* so that the poles are simple.

**2.5.2.** Repeat Exercise 2.5.1 if *xj* 's are (1): real type-2 beta random variables with parameters *(αj , βj )*, *αj >* 0*, βj >* 0 and (2): real gamma random variables with the parameters *(αj , βj ), αj >* 0*, βj >* 0*, j* = 1*,* 2.

**2.5.3.** Let *<sup>u</sup>* <sup>=</sup> *<sup>u</sup>*<sup>1</sup> *u*2 where *u*<sup>1</sup> and *u*<sup>2</sup> are real positive random variables. Then the *h*-th moment, for arbitrary *h*, is *E*[ *u*1 *u*2 ] *h* <sup>=</sup> *<sup>E</sup>*[*u<sup>h</sup>* 1 ] *<sup>E</sup>*[*u<sup>h</sup>* <sup>2</sup> ] in general. Give two examples where *<sup>E</sup>*[ *u*1 *u*2 ] *<sup>h</sup>* <sup>=</sup> *<sup>E</sup>*[*u<sup>h</sup>* 1 ] .

$$\frac{\cdot}{E[u\_2^k]}$$

**2.5.4.** *<sup>E</sup>*[ <sup>1</sup> *u*] *<sup>h</sup>* <sup>=</sup> *<sup>E</sup>*[*u*−*h*] <sup>=</sup> <sup>1</sup> *<sup>E</sup>*[*uh*] in general. Give two examples where *<sup>E</sup>*[ <sup>1</sup> *<sup>u</sup>*] = <sup>1</sup> *E*[*u*] .

**2.5.5.** Let *<sup>u</sup>* <sup>=</sup> *<sup>x</sup>*1*x*<sup>2</sup> *x*3*x*4 where the *xj* 's are independently distributed. Let *x*1*, x*<sup>3</sup> be type-1 beta random variables, *x*<sup>2</sup> be a type-2 beta random variable, and *x*<sup>4</sup> be a gamma random variable with parameters *(αj , βj ), αj >* 0*, βj >* 0*, j* = 1*,* 2*,* 3*,* 4. Determine the density of *u*.

#### **2.6. A Collection of Random Variables**

Let *x*1*,...,xn* be iid (independently and identically distributed) real scalar random variables with a common density denoted by *f (x)*, that is, assume that the sample comes from the population that is specified by *f (x)*. Let the common mean value be *μ* and the common variance be *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>lt;* <sup>∞</sup>, that is, *E(xj )* <sup>=</sup> *<sup>μ</sup>* and Var*(xj )* <sup>=</sup> *<sup>σ</sup>*2*, j* <sup>=</sup> <sup>1</sup>*,...,n*, where *<sup>E</sup>* denotes the expected value. Denoting the sample average by *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+*xn)*, what can be said about *x*¯ when *n* → ∞? This is the type of questions that will be investigated in this section.

#### **2.6.1. Chebyshev's inequality**

For some *k >* 0*,* let us examine the probability content of |*x* − *μ*| where *μ* = *E(x)* and the variance of *<sup>x</sup>* is *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>lt;* <sup>∞</sup>. Consider the probability that the random variable *<sup>x</sup>* lies outside the interval *μ* − *kσ < x < μ* + *kσ*, that is *k* times the standard deviation *σ* away from the mean value *μ*. From the definition of the variance *σ*<sup>2</sup> for a real scalar random variable *x*,

$$\begin{split} \sigma^{2} &= \int\_{-\infty}^{\infty} (\mathbf{x} - \mu)^{2} f(\mathbf{x}) \mathbf{dx} \\ &= \int\_{-\infty}^{\mu - k\sigma} (\mathbf{x} - \mu)^{2} f(\mathbf{x}) \mathbf{dx} + \int\_{\mu - k\sigma}^{\mu + k\sigma} (\mathbf{x} - \mu)^{2} f(\mathbf{x}) \mathbf{dx} + \int\_{\mu + k\sigma}^{\infty} (\mathbf{x} - \mu)^{2} f(\mathbf{x}) \mathbf{dx} \\ &\geq \int\_{-\infty}^{\mu - k\sigma} (\mathbf{x} - \mu)^{2} f(\mathbf{x}) \mathbf{dx} + \int\_{\mu + k\sigma}^{\infty} (\mathbf{x} - \mu)^{2} f(\mathbf{x}) \mathbf{dx} \end{split}$$

since the probability content over the interval *μ* − *kσ < x < μ* + *kσ* is omitted. Over this interval, the probability is either positive or zero, and hence the inequality. However, the intervals *(*−∞*, μ*−*kσ*] and [*μ*+*kσ,*∞*)*, that is, −∞ *< x* ≤ *μ*−*kσ* and *μ*+*kσ* ≤ *x <* ∞ or −∞ *< x* − *μ* ≤ −*kσ* and *kσ* ≤ *x* − *μ <* ∞*,* can thus be described as the intervals for which |*x* − *μ*| ≥ *kσ*. In these intervals, the smallest value that |*x* − *μ*| can take on is *kσ, k >* 0 or equivalently, the smallest value that |*x* − *μ*| <sup>2</sup> can assume is *(kσ )*<sup>2</sup> <sup>=</sup> *<sup>k</sup>*2*σ*2. Accordingly, the above inequality can be further sharpened as follows:

$$\begin{split} \sigma^{2} &\geq \int\_{|\boldsymbol{x}-\boldsymbol{\mu}|\geq k\sigma} (\boldsymbol{x}-\boldsymbol{\mu})^{2} f(\boldsymbol{x}) \mathrm{d}\mathbf{x} \geq \int\_{|\boldsymbol{x}-\boldsymbol{\mu}|\geq k\sigma} (k\sigma)^{2} f(\boldsymbol{x}) \mathrm{d}\mathbf{x} \Rightarrow\\ \frac{\sigma^{2}}{k^{2}\sigma^{2}} &\geq \int\_{|\boldsymbol{x}-\boldsymbol{\mu}|\geq k\sigma} f(\boldsymbol{x}) \mathrm{d}\mathbf{x} \Rightarrow\\ \frac{1}{k^{2}} &\geq \int\_{|\boldsymbol{x}-\boldsymbol{\mu}|\geq k\sigma} f(\boldsymbol{x}) \mathrm{d}\mathbf{x} = Pr\{|\boldsymbol{x}-\boldsymbol{\mu}| \geq k\sigma, \} \text{ that is},\\ \frac{1}{k^{2}} &\geq Pr\{|\boldsymbol{x}-\boldsymbol{\mu}| \geq k\sigma\}, \end{split}$$

which can be written as

$$\Pr\{|\mathbf{x} - \boldsymbol{\mu}| \ge k\sigma\} \le \frac{1}{k^2} \text{ or } \Pr\{|\mathbf{x} - \boldsymbol{\mu}| < k\sigma\} \ge 1 - \frac{1}{k^2}.\tag{2.6.1}$$

If *kσ* <sup>=</sup> *<sup>k</sup>*1, *<sup>k</sup>* <sup>=</sup> *<sup>k</sup>*<sup>1</sup> *<sup>σ</sup>*<sup>2</sup> , and the above inequalities can be written as follows:

$$\Pr\{|\mathbf{x} - \boldsymbol{\mu}| \ge k\} \le \frac{\sigma^2}{k^2} \text{ or } \Pr\{|\mathbf{x} - \boldsymbol{\mu}| < k\} \ge 1 - \frac{\sigma^2}{k^2}.\tag{2.6.2}$$

The inequalities (2.6.1) and (2.6.2) are known as Chebyshev's inequalities (also referred to as Chebycheff's inequalities). For example, when *k* = 2*,* Chebyshev's inequality states that *P r*{|*<sup>x</sup>* <sup>−</sup> *<sup>μ</sup>*<sup>|</sup> *<sup>&</sup>lt;* <sup>2</sup>*σ*} ≥ <sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>4</sup> = 0*.*75, which is not a very sharp probability limit. If *<sup>x</sup>* <sup>∼</sup> *<sup>N</sup>*1*(μ, σ*2*)*, then we know that

$$\Pr\{|\mathbf{x} - \boldsymbol{\mu}| < 1.96\sigma\} \approx 0.95 \text{ and } \Pr\{|\mathbf{x} - \boldsymbol{\mu}| < 3\sigma\} \approx 0.99.$$

Note that the bound 0*.*75 resulting from Chebyshev's inequality seriously underestimate the actual probability for a Gaussian variable *x*. However, what is astonishing about this inequality, is that the given probability bound holds for any distribution, whether it be continuous, discrete or mixed. Sharper bounds can of course be obtained for the probability content of the interval [*μ* − *kσ, μ* + *kσ*] when the exact distribution of *x* is known.

These inequalities can be expressed in terms of generalized moments. Let *μ* 1 *r <sup>r</sup>* = {*E*|*x*− *μ*| *r*} 1 *<sup>r</sup> , r* = 1*,* 2*,... ,* which happens to be a measure of scatter in *x* from the mean value *μ*. Given that

$$
\mu\_r = \int\_{-\infty}^{\infty} |\mathbf{x} - \mu|^r f(\mathbf{x}) d\mathbf{x},
$$

consider the probability content of the intervals specified by |*x* − *μ*| ≥ *kμ* 1 *r <sup>r</sup>* for *k >* 0. Paralleling the derivations of (2.6.1) and (2.6.2), we have

$$\mu\_r \ge \int\_{|\mathbf{x} - \boldsymbol{\mu}| \ge k\mu\_r^{\frac{1}{r}}} |\mathbf{x} - \boldsymbol{\mu}|^r f(\mathbf{x}) d\mathbf{x} \ge \int\_{|\mathbf{x} - \boldsymbol{\mu}| \ge k\mu\_r^{\frac{1}{r}}} |(k\mu\_r^{\frac{1}{r}})|^r f(\mathbf{x}) d\mathbf{x} \implies \Pr\{ |\mathbf{x} - \boldsymbol{\mu}| < k\mu\_r^{\frac{1}{r}} \} \le \frac{1}{k^r},\tag{2.6.3}$$
 
$$\Pr\{ |\mathbf{x} - \boldsymbol{\mu}| \ge k\mu\_r^{\frac{1}{r}} \} \le \frac{1}{k^r} \text{ or } \Pr\{ |\mathbf{x} - \boldsymbol{\mu}| < k\mu\_r^{\frac{1}{r}} \} \ge 1 - \frac{1}{k^r},\tag{2.6.3}$$

which can also be written as

$$\Pr\{|\mathbf{x} - \boldsymbol{\mu}| \ge k\} \le \frac{\mu\_r}{k^r} \text{ or } \Pr\{|\mathbf{x} - \boldsymbol{\mu}| < k\} \ge 1 - \frac{\mu\_r}{k^r}, \ r = 1, 2, \dots \quad (2.6.4)$$

Note that when *<sup>r</sup>* <sup>=</sup> 2, *μr* <sup>=</sup> *<sup>σ</sup>*2, and Chebyshev's inequalities as specified in (2.6.1) and (2.6.2) are obtained from (2.6.3) and (2.6.4), respectively. If *x* is a real scalar positive random variables with *f (x)* = 0 for *x* ≤ 0, we can then obtain similar inequalities in terms of the first moment *μ*. For *k >* 0,

$$\begin{split} \mu &= E(\mathbf{x}) = \int\_{0}^{\infty} x f(\mathbf{x}) d\mathbf{x} \text{ since } f(\mathbf{x}) = 0 \text{ for } \mathbf{x} \le 0 \\ &= \int\_{0}^{k} x f(\mathbf{x}) d\mathbf{x} + \int\_{k}^{\infty} x f(\mathbf{x}) d\mathbf{x} \ge \int\_{k}^{\infty} x f(\mathbf{x}) d\mathbf{x} \ge \int\_{k}^{\infty} k f(\mathbf{x}) d\mathbf{x} \implies \\ &\overset{\mu}{k} \ge \int\_{k}^{\infty} f(\mathbf{x}) d\mathbf{x} = Pr\{\mathbf{x} \ge k\}. \end{split}$$

Accordingly, we have the following inequality for any real positive random variable *x*:

$$\Pr\{\mathbf{x} \ge k\} \le \frac{\mu}{k} \text{ for } \mathbf{x} > \mathbf{0}, k > \mathbf{0}. \tag{2.6.5}$$

Suppose that our variable is *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+ *xn),* where *x*1*,...,xn* are iid variables with common mean value *<sup>μ</sup>* and the common variance *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>lt;* <sup>∞</sup>. Then, since Var*(x)*¯ <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *n* and *E(x)*¯ = *μ*, Chebyshev's inequality states that

$$\Pr\{|\bar{x} - \mu| < k\} \ge 1 - \frac{\sigma^2}{n} \to 1 \text{ as } n \to \infty \tag{2.6.6}$$

or *P r*{| ¯*x* − *μ*| ≥ *k*} → 0 as *n* → ∞. However, since a probability cannot be greater than 1, *P r*{| ¯*x* − *μ*| *< k*} → 1 as *n* → ∞. In other words, *x*¯ tends to *μ* with probability 1 as *n* → ∞. This is referred to as the *Weak Law of Large Numbers*.

#### **The Weak Law of Large Numbers**

Let *<sup>x</sup>*1*,...,xn* be iid with common mean value *<sup>μ</sup>* and common variance *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>lt;* <sup>∞</sup>. Then, as *n* → ∞*,*

$$\Pr\{\bar{x}\rightarrow\mu\} \rightarrow 1.\tag{2.6.7}$$

Another limiting property is known as the *Central Limit Theorem*. Let *x*1*,...,xn* be iid real scalar random variables with common mean value *<sup>μ</sup>* and common variance *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>lt;* <sup>∞</sup>. Letting *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+ *xn)* denote the sample mean, the standardized sample mean is

$$u = \frac{\bar{x} - E(\bar{x})}{\sqrt{\text{Var}(\bar{x})}} = \frac{\sqrt{n}}{\sigma} (\bar{x} - \mu) = \frac{1}{\sigma\sqrt{n}} [(\mathbf{x}\_1 - \mu) + \dots + (\mathbf{x}\_n - \mu)].\tag{i}$$

Consider the characteristic function of *x* − *μ*, that is,

$$\phi\_{\mathbf{x}-\boldsymbol{\mu}}(t) = E[\mathbf{e}^{it(\mathbf{x}-\boldsymbol{\mu})}] = 1 + \frac{it}{1!}E(\mathbf{x}-\boldsymbol{\mu}) + \frac{(it)^2}{2!}E(\mathbf{x}-\boldsymbol{\mu})^2 + \cdots$$

$$= 1 + 0 - \frac{t^2}{2!}E(\mathbf{x}-\boldsymbol{\mu})^2 + \cdots \ = 1 + \frac{t}{1!}\phi^{(1)}(0) + \frac{t^2}{2!}\phi^{(2)}(0) + \cdots \ \quad (\text{i}\mathbf{i})$$

where *<sup>φ</sup>(r)(*0*)* is the *<sup>r</sup>*-th derivative of *φ(t)* with respect to *<sup>t</sup>*, evaluated at *<sup>t</sup>* <sup>=</sup> 0. Let us consider the characteristic function of our standardized sample mean *u*.

Making use of the last representation of *u* in *(i)*, we have *φ<sup>n</sup> <sup>j</sup>*=1*(xj* <sup>−</sup>*μ) σ* √*n (t)* = [*φxj*−*μ( <sup>t</sup> σ* <sup>√</sup>*n)*] *<sup>n</sup>* so that *φu(t)* = [*φxj*−*μ( <sup>t</sup> σ* <sup>√</sup>*n)*] *<sup>n</sup>* or ln *φu(t)* <sup>=</sup> *<sup>n</sup>* ln *φxj*−*μ( <sup>t</sup> σ* <sup>√</sup>*n)*. It then follows from *(ii)* that

$$[\phi\_{\mathbf{x}\_j-\mu}(\frac{t}{\sigma\sqrt{n}})] = 1 + 0 - \frac{t^2}{2!} \frac{\sigma^2}{n\sigma^2} - i\frac{t^3}{3!} \frac{E(\mathbf{x}\_j-\mu)^3}{(\sigma\sqrt{n})^3} + \dots$$

$$= 1 - \frac{t^2}{2n} + O\left(\frac{1}{n^{\frac{3}{2}}}\right). \tag{iii}$$

Now noting that ln*(*<sup>1</sup> <sup>−</sup> *y)* = −[*<sup>y</sup>* <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>2</sup> <sup>+</sup> *<sup>y</sup>*<sup>3</sup> <sup>3</sup> +··· ] whenever |*y*| *<* 1, we have

$$\ln \phi\_{\mathbf{x}\_j-\mu}(\frac{t}{\sigma\sqrt{n}}) = -\frac{t^2}{2n} + O\left(\frac{1}{n^{\frac{3}{2}}}\right) \Rightarrow n\ln \phi\_{\mathbf{x}\_j-\mu}(\frac{t}{\sigma\sqrt{n}}) = -\frac{t^2}{2} + O\left(\frac{1}{n^{\frac{1}{2}}}\right) \to -\frac{t^2}{2} \text{ as } n \to \infty.$$

Consequently, as *n* → ∞,

$$\phi\_{\mu}(t) = \mathbf{e}^{-\frac{t^2}{2}} \Rightarrow \mu \to N\_1(0, 1) \text{ as } n \to \infty.$$

This is known as the central limit theorem.

**The Central Limit Theorem.** *Let x*1*,...,xn be iid real scalar random variables having common mean value <sup>μ</sup> and common variance <sup>σ</sup>*<sup>2</sup> *<sup>&</sup>lt;* <sup>∞</sup>*. Let the sample mean be <sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+ *xn) and u denote the standardized sample mean. Then*

$$
\mu = \frac{\bar{\boldsymbol{x}} - E(\bar{\boldsymbol{x}})}{\sqrt{\text{Var}(\bar{\boldsymbol{x}})}} = \frac{\sqrt{n}}{\sigma} (\bar{\boldsymbol{x}} - \mu) \to N\_1(0, 1) \text{ as } n \to \infty. \tag{2.6.8}
$$

Generalizations, extensions and more rigorous statements of this theorem are available in the literature. We have focussed on the substance of the result, assuming that a simple random sample is available and that the variance of the population is finite.

#### **Exercises 2.6**

**2.6.1.** For a binomial random variable with the probability function *f (x)* = *n x px (*<sup>1</sup> <sup>−</sup> *p)n*−*x,* <sup>0</sup> *<p<* <sup>1</sup>*, x* <sup>=</sup> <sup>0</sup>*,* <sup>1</sup>*, . . . , n, n* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,...* and zero elsewhere, show that the standardized binomial variable itself, namely *<sup>x</sup>*−*np* <sup>√</sup>*np(*1−*p)* goes to the standard normal when *n* → ∞.

**2.6.2.** State the central limit theorem for the following real scalar populations by evaluating the mean value and variance there, assuming that a simple random sample is available: (1) Poisson random variable with parameter *λ*; (2) Geometric random variable with parameter *p*; (3) Negative binomial random variable with parameters *(p, k)*; (4) Discrete hypergeometric probability law with parameters *(a, b, n)*; (5) Uniform density over [*a, b*]; (6) Exponential density with parameter *θ*; (7) Gamma density with the parameters *(α, β)*; (8) Type-1 beta random variable with the parameters *(α, β)*; (9) Type-2 beta random variable with the parameters *(α, β)*.

**2.6.3.** State the central limit theorem for the following probability/density functions: (1):

$$f(\mathbf{x}) = \begin{cases} 0.5, \mathbf{x} = 2 \\ 0.5, \mathbf{x} = \mathbf{5} \end{cases}$$

and *f (x)* <sup>=</sup> 0 elsewhere; (2): *f (x)* <sup>=</sup> 2e−<sup>2</sup>*x,* <sup>0</sup> <sup>≤</sup> *x <* <sup>∞</sup> and zero elsewhere; (3): *f (x)* = 1*,* 0 ≤ *x* ≤ 1 and zero elsewhere. Assume that a simple random sample is available from each population.

**2.6.4.** Consider a real scalar gamma random variable *x* with the parameters *(α, β)* and show that *E(x)* <sup>=</sup> *αβ* and variance of *<sup>x</sup>* is *αβ*2. Assume a simple random sample *x*1*,...,xn* from this population. Derive the densities of (1): *x*<sup>1</sup> +··· + *xn*; (2): *x*¯; (3): *x*¯ − *αβ*; (4): Standardized sample mean *x*¯. Show that the densities in all these cases are still gamma densities, may be relocated, for all finite values of *n* however large *n* may be.

**2.6.5.** Consider the density *f (x)* <sup>=</sup> *<sup>c</sup> <sup>x</sup><sup>α</sup> ,* 1 ≤ *x <* ∞ and zero elsewhere, where *c* is the normalizing constant. Evaluate *c* stating the relevant conditions. State the central limit theorem for this population, stating the relevant conditions.

#### **2.7. Parameter Estimation: Point Estimation**

There exist several methods for estimating the parameters of a given density/probability function, based on a simple random sample of size *n* (iid variables from the population designated by the density/probability function). The most popular methods of point estimation are the method of maximum likelihood and the method of moments.

#### **2.7.1. The method of moments and the method of maximum likelihood**

The likelihood function *L(θ )* is the joint density/probability function of the sample values, at an observed sample point, *x*1*,...,xn*. As a function of *θ , L(θ )*, or a one-to-one function thereof, is maximized in order to determine the most likely value of *θ* in terms of a function of the given sample. This estimation process is referred to as the method of maximum likelihood.

Let *mr* = *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>x</sup><sup>r</sup> j <sup>n</sup>* denote the *r*-th integer moment of the sample, where *x*1*,...,xn* is the observed sample point, the corresponding population *<sup>r</sup>*-th moment being *<sup>E</sup>*[*xr*], where *E* denotes the expected value. According to the method of moments, the estimates of the parameters are obtained by solving *mr* <sup>=</sup> *<sup>E</sup>*[*xr*]*, r* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,...* .

For example, consider a *N*1*(μ, σ*2*)* population with density

$$f(\mathbf{x}) = \frac{1}{\sqrt{2\pi}\sigma} \mathbf{e}^{-\frac{1}{2\sigma^2}(\mathbf{x}-\mu)^2}, \ -\infty < \mathbf{x} < \infty, \ -\infty < \mu < \infty, \ \sigma > 0,\tag{2.7.1}$$

where *μ* and *σ*<sup>2</sup> are the parameters here. Let *x*1*,...,xn* be a simple random sample from this population. Then, the joint density of *x*1*,...,xn*, denoted by *L* = *L(x*1*, ...,xn*;*μ, σ*2*)*, is

$$L = \frac{1}{[\sqrt{2\pi}\sigma]^n} \mathbf{e}^{-\frac{1}{2\sigma^2} \sum\_{j=1}^n (\mathbf{x}\_j - \boldsymbol{\mu})^2} = \frac{1}{[\sqrt{2\pi}\sigma]^n} \mathbf{e}^{-\frac{1}{2\sigma^2} [\sum\_{j=1}^n (\mathbf{x}\_j - \bar{\mathbf{x}})^2 + n(\bar{\mathbf{x}} - \boldsymbol{\mu})^2]} \to \tag{2.7.27}$$

$$\ln L = -n \ln(\sqrt{2\pi}\sigma) - \frac{1}{2\sigma^2} \left[ \sum\_{j=1}^n (\mathbf{x}\_j - \bar{\mathbf{x}})^2 + n(\bar{\mathbf{x}} - \boldsymbol{\mu})^2 \right], \ \bar{\mathbf{x}} = \frac{1}{n} (\mathbf{x}\_1 + \dots + \mathbf{x}\_n). \tag{2.7.28}$$

Maximizing *L* or ln*L*, since *L* and ln*L* are one-to-one functions, with respect to *μ* and *<sup>θ</sup>* <sup>=</sup> *<sup>σ</sup>*2, and solving for *<sup>μ</sup>* and *<sup>σ</sup>*<sup>2</sup> produces the maximum likelihood estimators (MLE's). An observed value of the estimator is the corresponding estimate. It follows from a basic result in Calculus that the extrema of *L* can be determined by solving the equations

$$\frac{\partial}{\partial \mu} \ln L = 0 \tag{i}$$

and

$$\frac{\partial}{\partial \theta} \ln L = 0, \; \theta = \sigma^2. \tag{ii}$$

Equation (i) produces the solution *μ* = ¯*x* so that *x*¯ is the MLE of *μ*. Note that *x*¯ is a random variable and that *x*¯ evaluated at a sample point or at a set of observations on *x*1*,...,xn* produces the corresponding estimate. We will denote both the estimator and estimate of *μ* by *μ*ˆ. As well, we will utilize the same abbreviation, namely, MLE for the maximum likelihood estimator and the corresponding estimate. Solving *(ii)* and substituting *μ*ˆ to *μ*, we have *<sup>θ</sup>*<sup>ˆ</sup> = ˆ*σ*<sup>2</sup> <sup>=</sup> <sup>1</sup> *n <sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(xj* − ¯*x)*<sup>2</sup> <sup>=</sup> *<sup>s</sup>*<sup>2</sup> <sup>=</sup> the sample variance as an estimate of *<sup>θ</sup>* <sup>=</sup> *<sup>σ</sup>*2. Does the point *(x, s* ¯ <sup>2</sup>*)* correspond to a local maximum or a local minimum or a saddle point? Since the matrix of second order partial derivatives at the point *(x, s* ¯ <sup>2</sup>*)* is negative definite, the critical point *(x, s* ¯ <sup>2</sup>*)* corresponds to a maximum. Thus, in this case, *<sup>μ</sup>*<sup>ˆ</sup> = ¯*<sup>x</sup>* and *<sup>σ</sup>*<sup>ˆ</sup> <sup>2</sup> <sup>=</sup> *<sup>s</sup>*<sup>2</sup> are the maximum likelihood estimators/estimates of the parameters *<sup>μ</sup>* and *<sup>σ</sup>*2, respectively. If we were to differentiate with respect to *<sup>σ</sup>* instead of *<sup>θ</sup>* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> in *(ii)*, we would obtain the same estimators, since for any differentiable function *g(t)*, <sup>d</sup> <sup>d</sup>*<sup>t</sup> g(t)* <sup>=</sup> <sup>0</sup> <sup>⇒</sup> <sup>d</sup> <sup>d</sup>*φ(t)g(t)* <sup>=</sup> 0 if <sup>d</sup> <sup>d</sup>*<sup>t</sup> φ(t)* <sup>=</sup> 0. In this instance, *φ(σ )* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> and <sup>d</sup> <sup>d</sup>*<sup>σ</sup> <sup>σ</sup>*<sup>2</sup> = 0.

For obtaining the moment estimates, we equate the sample integer moments to the corresponding population moments, that is, we let *mr* <sup>=</sup> *<sup>E</sup>*[*xr*]*, r* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,* two equations being required to estimate *<sup>μ</sup>* and *<sup>σ</sup>*2. Note that *<sup>m</sup>*<sup>1</sup> = ¯*<sup>x</sup>* and *<sup>m</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> *n <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>x</sup>*<sup>2</sup> *<sup>j</sup>* . Then, consider the equations

$$\bar{x} = E[x] = \mu \text{ and } \frac{1}{n} \sum\_{j=1}^{n} x\_j^2 = E[x^2] \Rightarrow s^2 = \sigma^2.$$

Thus, the moment estimators/estimates of *<sup>μ</sup>* and *<sup>σ</sup>*2, which are *<sup>μ</sup>*<sup>ˆ</sup> = ¯*<sup>x</sup>* and *<sup>σ</sup>*<sup>ˆ</sup> <sup>2</sup> <sup>=</sup> *<sup>s</sup>*2, happen to be identical to the MLE's in this case.

Let us consider the type-1 beta population with parameters *(α, β)* whose density is

$$f\_1(\mathbf{x}) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \mathbf{x}^{\alpha - 1} (1 - \mathbf{x})^{\beta - 1}, \ 0 \le \mathbf{x} \le 1, \ \alpha > 0, \ \beta > 0,\tag{2.7.3}$$

and zero otherwise. In this case, the likelihood function contains gamma functions and the derivatives of gamma functions involve psi and zeta functions. Accordingly, the maximum likelihood approach is not very convenient here. However, we can determine moment estimates without much difficulty from (2.7.3). The first two population integer moments are obtained directly from a representation of the *h*-th moment:

$$\begin{aligned} E[\mathbf{x}^h] &= \frac{\Gamma(\alpha + h)}{\Gamma(\alpha)} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha + \beta + h)}, \; \Re(\alpha + h) > 0 \Rightarrow \\\ E[\mathbf{x}] &= \frac{\Gamma(\alpha + 1)}{\Gamma(\alpha)} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha + \beta + 1)} = \frac{\alpha}{\alpha + \beta} \end{aligned} \tag{2.7.4}$$

$$E[\mathbf{x}^2] = \frac{\alpha(\alpha+1)}{(\alpha+\beta)(\alpha+\beta+1)} = E[\mathbf{x}] \frac{\alpha+1}{\alpha+\beta+1}.\tag{2.7.5}$$

Equating the sample moments to the corresponding population moments, that is, letting *<sup>m</sup>*<sup>1</sup> <sup>=</sup> *<sup>E</sup>*[*x*] and *<sup>m</sup>*<sup>2</sup> <sup>=</sup> *<sup>E</sup>*[*x*2]*,* it follows from (2.7.4) that

$$
\bar{\bar{x}} = \frac{\alpha}{\alpha + \beta} \Rightarrow \frac{\beta}{\alpha} = \frac{1 - \bar{\bar{x}}}{\bar{\bar{x}}};\tag{iiii}
$$

Then, from (2.7.5), we have

$$\frac{\frac{1}{n}\sum\_{j=1}^{n}\mathbf{x}\_{j}^{2}}{\bar{\mathbf{x}}} = \frac{\alpha+1}{\alpha+\beta+1} = \frac{1}{1+\frac{\beta}{\alpha+1}} \Rightarrow \frac{\bar{\mathbf{x}} - \sum\_{j=1}^{n}\mathbf{x}\_{j}^{2}/n}{\sum\_{j=1}^{n}\mathbf{x}\_{j}^{2}/n} = \frac{\beta}{\alpha+1}.\tag{\text{iv}}$$

The parameter *β* can be eliminated from *(iii)* and *(iv)*, which yields an estimate of *α*; *β*ˆ is then obtained from *(iii)*. Thus, the moment estimates are available from the equations, *mr* <sup>=</sup> *<sup>E</sup>*[*xr*]*, r* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,* even though these equations are nonlinear in the parameters *<sup>α</sup>* and *β*. The method of maximum likelihood or the method of moments can similarly yield parameters estimates for populations that are otherwise distributed.

#### **2.7.2. Bayes' estimates**

This procedure is more relevant when the parameters in a given statistical density/probability function have their own distributions. For example, let the real scalar variable *x* be discrete having a binomial probability law for the fixed (given) parameter *p*, that is, let *f (x*|*p)* = *n x <sup>p</sup>x(*<sup>1</sup> <sup>−</sup> *p)n*−*x,* <sup>0</sup> *<p<* <sup>1</sup>*, x* <sup>=</sup> <sup>0</sup>*,* <sup>1</sup>*, . . . , n, n* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,...,* and *f (x*|*p)* = 0 elsewhere be the conditional probability function. Let *p* have a prior type-1 beta density with known parameters *α* and *β*, that is, let the prior density of *p* bec

$$g(p) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} p^{\alpha - 1} (1 - p)^{\beta - 1}, \ 0 < p < 1, \ \alpha > 0, \ \beta > 0$$

and *g(p)* = 0 elsewhere. Then, the joint probability function *f (x, p)* = *f (x*|*p)g(p)* and the unconditional probability function of *x*, denoted by *f*1*(x)*, is as follows:

$$\begin{split} f\_{1}(\mathbf{x}) &= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \binom{n}{\mathbf{x}} \int\_{0}^{1} p^{\alpha+\mathbf{x}-1} (1-p)^{\beta+n-\mathbf{x}-1} \mathrm{d}p \\ &= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \binom{n}{\mathbf{x}} \frac{\Gamma(\alpha+\mathbf{x})\Gamma(\beta+n-\mathbf{x})}{\Gamma(\alpha+\beta+n)}. \end{split}$$

Thus, the posterior density of *p*, given *x*, denoted by *g*1*(p*|*x)*, is

$$\log\_1(p|\mathbf{x}) = \frac{f(\mathbf{x}, p)}{f\_1(\mathbf{x})} = \frac{\Gamma(\alpha + \beta + n)}{\Gamma(\alpha + \mathbf{x})\Gamma(\beta + n - \mathbf{x})} p^{\alpha + \mathbf{x} - 1}(1 - p)^{\beta + n - \mathbf{x} - 1}.$$

Accordingly, the expected value of *p* in this conditional distribution of *p* given *x*, which is called the posterior density of *p*, is known as the Bayes estimate of *p*:

$$\begin{split} E[p|\boldsymbol{x}] &= \frac{\Gamma(\boldsymbol{\alpha} + \boldsymbol{\beta} + \boldsymbol{n})}{\Gamma(\boldsymbol{\alpha} + \boldsymbol{x})\Gamma(\boldsymbol{\beta} + \boldsymbol{n} - \boldsymbol{x})} \int\_{0}^{1} p \, p^{\boldsymbol{\alpha} + \boldsymbol{x} - 1} (1 - p)^{\boldsymbol{\beta} + \boldsymbol{n} - \boldsymbol{x} - 1} \mathrm{d}p \\ &= \frac{\Gamma(\boldsymbol{\alpha} + \boldsymbol{\beta} + \boldsymbol{n})}{\Gamma(\boldsymbol{\alpha} + \boldsymbol{x})\Gamma(\boldsymbol{\beta} + \boldsymbol{n} - \boldsymbol{x})} \frac{\Gamma(\boldsymbol{\alpha} + \boldsymbol{x} + 1)\Gamma(\boldsymbol{\beta} + \boldsymbol{n} - \boldsymbol{x})}{\Gamma(\boldsymbol{\alpha} + \boldsymbol{\beta} + \boldsymbol{n} + 1)} \\ &= \frac{\Gamma(\boldsymbol{\alpha} + \boldsymbol{x} + 1)}{\Gamma(\boldsymbol{\alpha} + \boldsymbol{x})} \frac{\Gamma(\boldsymbol{\alpha} + \boldsymbol{\beta} + \boldsymbol{n})}{\Gamma(\boldsymbol{\alpha} + \boldsymbol{\beta} + \boldsymbol{n} + 1)} = \frac{\boldsymbol{\alpha} + \boldsymbol{x}}{\boldsymbol{\alpha} + \boldsymbol{\beta} + \boldsymbol{n}}. \end{split}$$

The prior estimate/estimator of *p* as obtained from the binomial distribution is *<sup>x</sup> <sup>n</sup>* and the posterior estimate or the Bayes estimate of *p* is

$$E\{p|x\} = \frac{\alpha + x}{\alpha + \beta + n},$$

so that *<sup>x</sup> <sup>n</sup>* is revised to *<sup>α</sup>*+*<sup>x</sup> <sup>α</sup>*+*β*+*n*. In general, if the conditional density/probability function of *x* given *θ* is *f (x*|*θ )* and the prior density/probability function of *θ* is *g(θ )*, then the posterior density of *θ* is *g*1*(θ*|*x)* and *E*[*θ*|*x*] or *the expected value of θ in the conditional distribution of θ given x is the Bayes estimate of θ*.

#### **2.7.3. Interval estimation**

Before concluding this section, the concept of confidence intervals or interval estimation of a parameter will be briefly touched upon. For example, let *x*1*,...,xn* be iid *<sup>N</sup>*1*(μ, σ*2*)* and let *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+ *xn)*. Then, *<sup>x</sup>*¯ <sup>∼</sup> *<sup>N</sup>*1*(μ, <sup>σ</sup>*<sup>2</sup> *<sup>n</sup> ), (x*¯ <sup>−</sup> *μ)* <sup>∼</sup> *<sup>N</sup>*1*(*0*, <sup>σ</sup>*<sup>2</sup> *n )* and *z* = √*n <sup>σ</sup> (x*¯ − *μ)* ∼ *N*1*(*0*,* 1*)*. Since the standard normal density *N*1*(*0*,* 1*)* is free of any parameter, one can select two percentiles, say, *a* and *b*, from a standard normal table and make a probability statement such as *P r*{*a<z<b*} = 1 − *α* for every given *α*; for instance, *P r*{−1*.*96 ≤ *z* ≤ 1*.*96} ≈ 0*.*95 for *α* = 0*.*05. Let *P r*{−*z <sup>α</sup>* <sup>2</sup> ≤ *z* ≤ *z <sup>α</sup>* <sup>2</sup> } = 1 − *α* where *z <sup>α</sup>* 2 is such that *P r(z > z <sup>α</sup>* <sup>2</sup> *)* <sup>=</sup> *<sup>α</sup>* <sup>2</sup> . The following inequalities are mathematically equivalent and hence the probabilities associated with the corresponding intervals are equal:

$$\begin{split} -z\_{\frac{\alpha}{2}} \leq z \leq z\_{\frac{\alpha}{2}} &\Leftrightarrow -z\_{\frac{\alpha}{2}} \leq \frac{\sqrt{n}}{\sigma}(\bar{\mathbf{x}} - \mu) \leq z\_{\frac{\alpha}{2}}\\ &\Leftrightarrow \mu - z\_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \leq \bar{\mathbf{x}} \leq \mu + z\_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\\ &\Leftrightarrow \bar{\mathbf{x}} - z\_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{\mathbf{x}} + z\_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}. \end{split}$$

Accordingly,

$$\Pr\{\mu - z\_{\frac{\sigma}{2}} \frac{\sigma}{\sqrt{n}} \le \bar{x} \le \mu + z\_{\frac{\sigma}{2}} \frac{\sigma}{\sqrt{n}}\} = 1 - \alpha \tag{ii}$$

⇔

$$\Pr\{\bar{\mathfrak{x}} - z\_{\frac{\sigma}{2}} \frac{\sigma}{\sqrt{n}} \le \mu \le \bar{\mathfrak{x}} + z\_{\frac{\sigma}{2}} \frac{\sigma}{\sqrt{n}}\} = 1 - \alpha. \tag{iii}$$

Note that *(iii)* is not a usual probability statement as opposed to *(ii)*, which is a probability statement on a random variable. In *(iii)*, the interval [ ¯*x* − *z <sup>α</sup>* 2 √*σ <sup>n</sup>, x*¯ + *z <sup>α</sup>* 2 √*σ <sup>n</sup>*] is random and *μ* is a constant. This can be given the interpretation that the random interval covers the parameter *μ* with probability 1 − *α*, which means that we are 100*(*1 − *α)*% confident that the random interval will cover the unknown parameter *μ* or that the random interval is an interval estimator and an observed value of the interval is the interval estimate of *μ*. We could construct such an interval only because the distribution of *z* is parameter-free, which enabled us to make a probability statement on *μ*.

In general, if *u* = *u(x*1*,...,xn,θ)* is a function of the sample values and the parameter *θ* (which may be a vector of parameters) and if the distribution of *u* is free of all parameter, then such a quantity is referred to as *a pivotal quantity*. Since the distribution of the pivotal quantity is parameter-free, we can find two numbers *a* and *b* such that *P r*{*a* ≤ *u* ≤ *b*} = 1 − *α* for every given *α*. If it is possible to convert the statement *a* ≤ *u* ≤ *b* into a mathematically equivalent statement of the type *u*<sup>1</sup> ≤ *θ* ≤ *u*2, so that *P r*{*u*<sup>1</sup> ≤ *θ* ≤ *u*2} = 1 − *α* for every given *α*, then [*u*1*, u*2] is called a 100*(*1 − *α)*% *confidence interval or interval estimate for θ*, *u*<sup>1</sup> and *u*<sup>2</sup> being referred to as the *lower confidence limit* and the *upper confidence limit*, and 1 − *α* being called the *confidence coefficient*. Additional results on interval estimation and the construction of confidence intervals are, for instance, presented in Mathai and Haubold (2017b).

#### **Exercises 2.7**

**2.7.1.** Obtain the method of moments estimators for the parameters *(α, β)* in a real type-2 beta population. Assume that a simple random sample of size *n* is available.

**2.7.2.** Obtain the estimate/estimator of the parameters by the method of moments and the method of maximum likelihood in the real (1): exponential population with parameter *θ*, (2): Poisson population with parameter *λ*. Assume that a simple random sample of size *n* is available.

**2.7.3.** Let *x*1*,...,xn* be a simple random sample of size *n* from a point Bernoulli population *<sup>f</sup>*2*(x)* <sup>=</sup> *<sup>p</sup>x(*<sup>1</sup> <sup>−</sup> *p)*1−*x, x* <sup>=</sup> <sup>0</sup>*,* <sup>1</sup>*,* <sup>0</sup> *<p<* 1 and zero elsewhere. Obtain the MLE as well as moment estimator for *p*. [Note: These will be the same estimators for *p* in all the populations based on Bernoulli trials, such as binomial population, geometric population, negative binomial population].

**2.7.4.** If possible, obtain moment estimators for the parameters of a real generalized gamma population, *<sup>f</sup>*3*(x)* <sup>=</sup> *c xα*−1e−*bx<sup>δ</sup> ,α >* 0*,b >* 0*,δ >* 0*, x* ≥ 0 and zero elsewhere, where *c* is the normalizing constant.

**2.7.5.** If possible, obtain the MLE of the parameters *a, b* in the following real uniform population *<sup>f</sup>*4*(x)* <sup>=</sup> <sup>1</sup> *<sup>b</sup>*−*<sup>a</sup> , b > a, a* <sup>≤</sup> *x<b* and zero elsewhere. What are the MLE if *a<x<b*? What are the moment estimators in these two situations?

**2.7.6.** Construct the Bayes' estimate/estimator of the parameter *λ* in a Poisson probability law if the prior density for *λ* is a gamma density with known parameters *(α, β)*.

**2.7.7.** By selecting the appropriate pivotal quantities, construct a 95% confidence interval for (1): Poisson parameter *λ*; (2): Exponential parameter *θ*; (3): Normal parameter *σ*2; (4): *<sup>θ</sup>* in a uniform density *f (x)* <sup>=</sup> <sup>1</sup> *<sup>θ</sup> ,* 0 ≤ *x* ≤ *θ* and zero elsewhere.

### **References**

A.M. Mathai (1993): *A Handbook of Generalized Special Functions for Statistical and Physical Sciences*, Oxford University Press, Oxford, 1993.

A.M.Mathai (1999): *Introduction to Geometrical Probability: Distributional Aspects and Applications*, Gordon and Breach, Amsterdam, 1999.

A.M. Mathai (2005): A pathway to matrix-variate gamma and normal densities, *Linear Algebra and its Applications*, **410**, 198-216.

A.M. Mathai (2013): Fractional integral operators in the complex matrix-variate case, *Linear Algebra and its Applications*, **439**, 2901-2913.

A.M. Mathai (2014): Fractional integral operators involving many matrix variables, *Linear Algebra and its Applications*, **446**, 196-215.

A.M. Mathai (2015): Fractional differential operators in the complex matrix-variate case, *Linear Algebra and its Applications*, **478**, 200-217.

A.M. Mathai and Hans J. Haubold (1988): *Modern Problems in Nuclear and Neutrino Astrophysics*, Akademie-Verlag, Berlin.

A.M. Mathai and Hans J. Haubold (2017a): *Linear Algebra, A course for Physicists and Engineers*, De Gruyter, Germany, 2017.

A.M. Mathai and Hans J. Haubold (2017b): *Probability and Statistics, A course for Physicists and Engineers*, De Gruyter, Germany, 2017.

A.M. Mathai and Hans J. Haubold (2018): *Erdelyi-Kober Fractional Calculus from a Sta- ´ tistical Perspective, Inspired by Solar Neutrino Physics*, Springer Briefs in Mathematical Physics, 31, Springer Nature, Singapore, 2018.

A.M. Mathai and Hans J. Haubold (2020): Mathematical aspects of Kratzel integral and ¨ Kratzel transform, ¨ *Mathematics*, **8(526)**; doi:10.3390/math8040526.

A.M. Mathai and Serge B. Provost (1992): *Quadratic Forms in Random Variables: Theory and Applications*, Marcel Dekker, New York, 1992.

A.M. Mathai and Serge B. Provost (2006): On q-logistic and related distributions, *IEEE Transactions on Reliability*, **55**(2), 237-244.

A.M. Mathai and R.K. Saxena (1973): *Generalized Hypergeometric Functions*, Springer Lecture Notes No. 348, Heidelberg, 1973.

A.M. Mathai and R. K. Saxena (1978): *The H-function with Applications in Statistics and Other Disciplines*, Wiley Halsted, New York, 1978.

A.M. Mathai, R.K. Saxena and Hans J. Haubold (2010): *The H-function: Theory and Applications*, Springer, New York, 2010.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 3 The Multivariate Gaussian and Related Distributions**

### **3.1. Introduction**

Real scalar mathematical as well as random variables will be denoted by lower-case letters such as *x, y, z,* and vector/matrix variables, whether mathematical or random, will be denoted by capital letters such as *X, Y, Z,* in the real case. Complex variables will be denoted with a tilde: *x,*˜ *y,*˜ *X,* ˜ *Y ,*˜ for instance. Constant matrices will be denoted by *A, B, C,* and so on. A tilde will be placed above constant matrices *only* if one wishes to stress the point that the matrix is in the complex domain. Equations will be numbered chapter and section-wise. Local numbering will be done subsection-wise. The determinant of a square matrix *A* will be denoted by |*A*| or det*(A)* and, in the complex case, the absolute value of the determinant of *A* will be denoted as |det*(A)*|. Observe that in the complex domain, det*(A)* = *a* + *ib* where *a* and *b* are real scalar quantities, and then, |det*(A)*| <sup>2</sup> <sup>=</sup> *<sup>a</sup>*<sup>2</sup> <sup>+</sup> *<sup>b</sup>*2.

*Multivariate* usually refers to a collection of scalar variables. Vector/matrix variable situations are also of the multivariate type but, in addition, the positions of the variables must also be taken into account. In a function involving a matrix, one cannot permute its elements since each permutation will produce a different matrix. For example,

$$X = \begin{bmatrix} x\_{11} & x\_{12} \\ x\_{21} & x\_{22} \end{bmatrix}, \ Y = \begin{bmatrix} \mathbf{y}\_{11} & \mathbf{y}\_{12} & \mathbf{y}\_{13} \\ \mathbf{y}\_{21} & \mathbf{y}\_{22} & \mathbf{y}\_{23} \end{bmatrix}, \ \tilde{X} = \begin{bmatrix} \tilde{\mathbf{x}}\_{11} & \tilde{\mathbf{x}}\_{12} \\ \tilde{\mathbf{x}}\_{12} & \tilde{\mathbf{x}}\_{22} \end{bmatrix}$$

are all multivariate cases but the elements or the individual variables must remain at the set positions in the matrices.

The definiteness of matrices will be needed in our discussion. Definiteness is defined and discussed only for symmetric matrices in the real domain and Hermitian matrices in the complex domain. Let *A* = *A* be a real *p* × *p* matrix and *Y* be a *p* × 1 real vector, *Y* denoting its transpose. Consider the quadratic form *Y* - *AY* , *A* = *A*- , for all possible *Y* excluding the null vector, that is, *Y* = *O*. We say that the real quadratic form *Y* - *AY* as well

as the real matrix *A* = *A* are positive definite, which is denoted *A>O*, if *Y* - *AY >* 0, for all possible non-null *Y* . Letting *A* = *A* be a real *p* × *p* matrix, if for all real *p* × 1 vector *Y* = *O*,

$$\begin{aligned} &Y^{'A}Y>0, \, A>0 \text{ (positive definite)}\\ &Y^{'A}Y \ge 0, \, A \ge O \text{ (positive semi-definite)}\\ &Y^{'A}Y<0, \, A$$

All the matrices that do not belong to any one of the above categories are said to be *indefinite matrices*, in which case *A* will have both positive and negative eigenvalues. For example, for some *Y* , *Y* - *AY* may be positive and for some other values of *Y* , *Y* - *AY* may be negative. The definiteness of Hermitian matrices can be defined in a similar manner. A square matrix *A* in the complex domain is called Hermitian if *A* = *A*<sup>∗</sup> where *A*<sup>∗</sup> means the conjugate transpose of *A*. Either the conjugates of all the elements of *A* are taken and the matrix is then transposed or the matrix *A* is first transposed and the conjugate of each of its elements is then taken. If *<sup>z</sup>*˜ <sup>=</sup> *<sup>a</sup>* <sup>+</sup> *ib, i* <sup>=</sup> <sup>√</sup>*(*−1*)* and *a, b* real scalar, then the conjugate of *z*˜, conjugate being denoted by a bar, is ¯ *z*˜ = *a* − *ib*, that is, *i* is replaced by −*i*. For instance, since

$$B = \begin{bmatrix} 2 & 1+i \\ 1-i & 5 \end{bmatrix} \Rightarrow \bar{B} = \begin{bmatrix} 2 & 1-i \\ 1+i & 5 \end{bmatrix} \Rightarrow (\bar{B})' = \bar{B}' = \begin{bmatrix} 2 & 1+i \\ 1-i & 5 \end{bmatrix} = B^\*,$$

*B* = *B*∗*,* and thus the matrix *B* is Hermitian. In general, if *X*˜ is a *p* × *p* matrix, then, *<sup>X</sup>*˜ can be written as *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup> *iX*<sup>2</sup> where *<sup>X</sup>*<sup>1</sup> and *<sup>X</sup>*<sup>2</sup> are real matrices and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. And if *X*˜ = *X*<sup>∗</sup> then *X*˜ = *X*<sup>1</sup> + *iX*<sup>2</sup> = *X*<sup>∗</sup> = *X*- <sup>1</sup> − *iX*- <sup>2</sup> or *X*<sup>1</sup> is symmetric and *X*<sup>2</sup> is skew symmetric so that all the diagonal elements of a Hermitian matrix are real. The definiteness of a Hermitian matrix can be defined parallel to that in the real case. Let *A* = *A*<sup>∗</sup> be a Hermitian matrix. In the complex domain, definiteness is defined only for Hermitian matrices. Let *Y* = *O* be a *p* × 1 non-null vector and let *Y* <sup>∗</sup> be its conjugate transpose. Then, consider the Hermitian form *Y* <sup>∗</sup>*AY, A* = *A*∗. If *Y* <sup>∗</sup>*AY >* 0 for all possible non-null *Y* = *O*, the Hermitian form *Y* <sup>∗</sup>*AY, A* = *A*<sup>∗</sup> as well as the Hermitian matrix *A* are said to be positive definite, which is denoted *A>O*. Letting *A* = *A*∗, if for all non-null *Y* ,

> *Y* ∗*AY >* 0*,A>O(*Hermitian positive definite*) Y* <sup>∗</sup>*AY* ≥ 0*, A* ≥ *O (*Hermitian positive semi-definite*) Y* ∗*AY <* 0*,A<O(*Hermitian negative definite*)* (3.1.2) *Y* <sup>∗</sup>*AY* ≤ 0*, A* ≤ *O (*Hermitian negative semi-definite*),*

and when none of the above cases applies, we have *indefinite matrices or indefinite Hermitian forms*.

We will also make use of properties of the square root of matrices. If we were to define the square root of *<sup>A</sup>* as *<sup>B</sup>* such as *<sup>B</sup>*<sup>2</sup> <sup>=</sup> *<sup>A</sup>*, there would then be several candidates for *<sup>B</sup>*. Since a multiplication of *A* with *A* is involved, *A* has to be a square matrix. Consider the following matrices

$$\begin{aligned} A\_1 &= \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \ A\_2 = \begin{bmatrix} -1 & 0 \\ 0 & 1 \end{bmatrix}, \ A\_3 = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}, \\\ A\_4 &= \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix}, \ A\_5 = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}, \end{aligned}$$

whose squares are all equal to *I*2. Thus, there are clearly several candidates for the square root of this identity matrix. However, if we restrict ourselves to the class of positive definite matrices in the real domain and Hermitian positive definite matrices in the complex domain, then we can define a unique square root, denoted by *A* 1 <sup>2</sup> *> O.*

For the various Jacobians used in this chapter, the reader may refer to Chap. 1, further details being available from Mathai (1997).

#### **3.1a. The Multivariate Gaussian Density in the Complex Domain**

Consider the complex scalar random variables *x*˜1*,..., x*˜*p*. Let *x*˜*<sup>j</sup>* = *xj*<sup>1</sup> + *ixj*<sup>2</sup> where *xj*1*, xj*<sup>2</sup> are real and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. Let *<sup>E</sup>*[*xj*1] = *μj*1*, E*[*xj*2] = *μj*<sup>2</sup> and *<sup>E</sup>*[ ˜*xj* ] = *μj*<sup>1</sup> <sup>+</sup> *iμj*<sup>2</sup> ≡ ˜*μj* . Let the variances be as follows: Var*(xj*1*)* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>j</sup>*1*,* Var*(xj*2*)* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>j</sup>*2. For a complex variable, the variance is defined as follows:

$$\begin{split} \text{Var}(\tilde{\mathbf{x}}\_{j}) &= E[\tilde{\mathbf{x}}\_{j} - E(\tilde{\mathbf{x}}\_{j})][\tilde{\mathbf{x}}\_{j} - E(\tilde{\mathbf{x}}\_{j})]^{\*} \\ &= E[(\mathbf{x}\_{j1} - \mu\_{j1}) + i(\mathbf{x}\_{j2} - \mu\_{j2})][(\mathbf{x}\_{j1} - \mu\_{j1}) - i(\mathbf{x}\_{j2} - \mu\_{j2})] \\ &= E[(\mathbf{x}\_{j1} - \mu\_{j1})^{2} + (\mathbf{x}\_{j2} - \mu\_{j2})^{2}] = \text{Var}(\mathbf{x}\_{j1}) + \text{Var}(\mathbf{x}\_{j2}) = \sigma\_{j1}^{2} + \sigma\_{j2}^{2} \\ &\equiv \sigma\_{j}^{2}. \end{split}$$

A covariance matrix associated with the *p* × 1 vector *X*˜ = *(x*˜1*,..., x*˜*p)* in the complex domain is defined as Cov*(X)*˜ = *E*[*X*˜ − *E(X)*˜ ][*X*˜ − *E(X)*˜ ] <sup>∗</sup> ≡ *Σ* with *E(X)*˜ ≡ ˜*μ* = *(μ*˜ <sup>1</sup>*,...,μ*˜ *p)*- . Then we have

$$
\Sigma = \text{Cov}(\tilde{X}) = \begin{bmatrix}
\sigma\_1^2 & \sigma\_{12} & \dots & \sigma\_{1p} \\
\sigma\_{21} & \sigma\_2^2 & \dots & \sigma\_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
\sigma\_{p1} & \sigma\_{p2} & \dots & \sigma\_p^2
\end{bmatrix},
$$

where the covariance between *x*˜*<sup>r</sup>* and *x*˜*s*, two distinct elements in *X*˜ , requires explanation. Let *x*˜*<sup>r</sup>* = *xr*<sup>1</sup> + *ixr*<sup>2</sup> and *x*˜*<sup>s</sup>* = *xs*<sup>1</sup> + *ixs*<sup>2</sup> where *xr*1*, xr*2*, xs*1*, xs*<sup>2</sup> are all real. Then, the covariance between *x*˜*<sup>r</sup>* and *x*˜*<sup>s</sup>* is

$$\begin{split} \text{Cov}(\tilde{\mathbf{x}}\_{r}, \tilde{\mathbf{x}}\_{s}) &= E[\tilde{\mathbf{x}}\_{r} - E(\tilde{\mathbf{x}}\_{r})][\tilde{\mathbf{x}}\_{s} - E(\tilde{\mathbf{x}}\_{s})]^{\*} = \text{Cov}[(\mathbf{x}\_{r1} + i\mathbf{x}\_{r2}), (\mathbf{x}\_{s1} - i\mathbf{x}\_{s2})] \\ &= \text{Cov}(\mathbf{x}\_{r1}, \mathbf{x}\_{s1}) + \text{Cov}(\mathbf{x}\_{r2}, \mathbf{x}\_{s2}) + i[\text{Cov}(\mathbf{x}\_{r2}, \mathbf{x}\_{s1}) - \text{Cov}(\mathbf{x}\_{r1}, \mathbf{x}\_{s2}) = \sigma\_{rs}]. \end{split}$$

Note that none of the individual covariances on the right-hand side need be equal to each other. Hence, *σrs* need not be equal to *σsr*. In terms of vectors, we have the following: Let *X*˜ = *X*<sup>1</sup> + *iX*<sup>2</sup> where *X*<sup>1</sup> and *X*<sup>2</sup> are real vectors. The covariance matrix associated with *X*˜ , which is denoted by Cov*(X)*˜ , is

$$\begin{aligned} \text{Cov}(\tilde{X}) &= E([\tilde{X} - E(\tilde{X})][\tilde{X} - E(\tilde{X})]^\*) \\ &= E([(X\_1 - E(X\_1)) + i(X\_2 - E(X\_2))][(X\_1' - E(X\_1')) - i(X\_2' - E(X\_2'))]) \\ &= \text{Cov}(X\_1, X\_1) + \text{Cov}(X\_2, X\_2) + i[\text{Cov}(X\_2, X\_1) - \text{Cov}(X\_1, X\_2)] \\ &\equiv \Sigma\_{11} + \Sigma\_{22} + i[\Sigma\_{21} - \Sigma\_{12}] \end{aligned}$$

where *Σ*<sup>12</sup> need not be equal to *Σ*21. Hence, in general, Cov*(X*1*, X*2*)* need not be equal to Cov*(X*2*, X*1*)*. We will denote the whole configuration as Cov*(X)*˜ = *Σ* and assume it to be Hermitian positive definite. We will define the *p*-variate Gaussian density in the complex domain as the following real-valued function:

$$f(\tilde{X}) = \frac{1}{\pi^p |\det(\Sigma)|} \mathbf{e}^{-(\tilde{X}-\tilde{\mu})^\* \Sigma^{-1} (\tilde{X}-\tilde{\mu})} \tag{3.1a.1}$$

where |det*(Σ)*| denotes the absolute value of the determinant of *Σ*. Let us verify that the normalizing constant is indeed <sup>1</sup> *πp*|det*(Σ)*| . Consider the transformation *<sup>Y</sup>*˜ <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup> *(X*˜ − ˜*μ)* which gives d*X*˜ = [det*(ΣΣ*∗*)*] 1 <sup>2</sup> d*Y*˜ = |det*(Σ)*|d*Y*˜ in light of (1.6a.1). Then |det*(Σ)*| is canceled and the exponent becomes −*Y*˜ <sup>∗</sup>*Y*˜ = −[| ˜*y*1| <sup>2</sup> +···+|˜*yp*<sup>|</sup> <sup>2</sup>]. But

$$\int\_{\tilde{\mathbf{y}}\_{j}} \mathbf{e}^{-|\tilde{\mathbf{y}}\_{j}|^{2}} \mathbf{d}\tilde{\mathbf{y}}\_{j} = \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} \mathbf{e}^{-(\mathbf{y}\_{j1}^{2} + \mathbf{y}\_{j2}^{2})} \mathbf{d}\mathbf{y}\_{j1} \wedge \mathbf{d}\mathbf{y}\_{j2} = \pi,\ \tilde{\mathbf{y}}\_{j} = \mathbf{y}\_{j1} + i\mathbf{y}\_{j2},\tag{i}$$

which establishes the normalizing constant. Let us examine the mean value and the covariance matrix of *<sup>X</sup>*˜ in the complex case. Let us utilize the same transformation, *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup> *(X*˜ − ˜*μ)*. Accordingly,

$$E[\tilde{X}] = \tilde{\mu} + E[(\tilde{X} - \tilde{\mu})] = \tilde{\mu} + \Sigma^{\frac{1}{2}}E[\tilde{Y}].$$

However,

$$E[\tilde{Y}] = \frac{1}{\pi^p} \int\_{\tilde{Y}} \tilde{Y} \mathbf{e}^{-\tilde{Y}^\* \tilde{Y}} \mathrm{d}\tilde{Y},$$

and the integrand has each element in *Y*˜ producing an odd function whose integral converges, so that the integral over *Y*˜ is null. Thus, *E*[*X*˜ ]= ˜*μ*, the first parameter appearing in the exponent of the density (3.1*a*.1). Now the covariance matrix in *X*˜ is the following:

$$\text{Cov}(\tilde{X}) = E([\tilde{X} - E(\tilde{X})][\tilde{X} - E(\tilde{X})]^\*) = \Sigma^{\frac{1}{2}} E[\tilde{Y}\tilde{Y}^\*] \Sigma^{\frac{1}{2}}.$$

We consider the integrand in *E*[*Y*˜*Y*˜ ∗] and follow steps parallel to those used in the real case. It is a *p* × *p* matrix where the non-diagonal elements are odd functions whose integrals converge and hence each of these elements will integrate out to zero. The first diagonal element in *Y*˜*Y*˜ <sup>∗</sup> is | ˜*y*1| 2. Its associated integral is

$$\begin{split} & \int \dots \int |\tilde{\mathbf{y}}\_1|^2 \mathbf{e}^{-(|\tilde{\mathbf{y}}\_1|^2 + \dots + |\tilde{\mathbf{y}}\_P|^2)} \mathbf{d} \tilde{\mathbf{y}}\_1 \wedge \dots \wedge \mathbf{d} \tilde{\mathbf{y}}\_P \\ & = \left\{ \prod\_{j=2}^p \mathbf{e}^{-|\tilde{\mathbf{y}}\_j|^2} \mathbf{d} \tilde{\mathbf{y}}\_j \right\} \int\_{\tilde{\mathbf{y}}\_1} |\tilde{\mathbf{y}}\_1|^2 \mathbf{e}^{-|\tilde{\mathbf{y}}\_1|^2} \mathbf{d} \tilde{\mathbf{y}}\_1. \end{split}$$

From *(i)*,

$$\int\_{\tilde{\mathbb{Y}}\_1} |\tilde{\mathbb{y}}\_1|^2 \mathbf{e}^{-|\tilde{\mathbb{y}}\_1|^2} \mathbf{d}\tilde{\mathbb{y}}\_1 = \pi; \\ \prod\_{j=2}^p \int\_{\tilde{\mathbb{y}}\_j} \mathbf{e}^{-|\tilde{\mathbb{y}}\_j|^2} \mathbf{d}\tilde{\mathbb{y}}\_j = \pi^{p-1}, $$

where | ˜*y*1| <sup>2</sup> <sup>=</sup> *<sup>y</sup>*<sup>2</sup> <sup>11</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>12</sup>*, <sup>y</sup>*˜<sup>1</sup> <sup>=</sup> *<sup>y</sup>*<sup>11</sup> <sup>+</sup> *iy*12*, i* <sup>=</sup> <sup>√</sup>*(*−1*),* and *<sup>y</sup>*11*, y*<sup>12</sup> real. Let *<sup>y</sup>*<sup>11</sup> <sup>=</sup> *r* cos *θ, y*<sup>12</sup> = *r* sin *θ* ⇒ d*y*<sup>11</sup> ∧ d*y*<sup>12</sup> = *r* d*r* ∧ d*θ* and

$$\begin{split} \int\_{\tilde{\mathbb{Y}}\_{1}} |\tilde{\mathbb{y}}\_{1}|^{2} \mathbf{e}^{-|\tilde{\mathbb{y}}\_{1}|^{2}} \mathrm{d}\tilde{\mathbb{y}}\_{1} &= \left( \int\_{r=0}^{\infty} r(r^{2}) \mathbf{e}^{-r^{2}} \mathrm{d}r \right) \left( \int\_{\theta=0}^{2\pi} \mathrm{d}\theta \right), \text{ ( letting } u = r^{2} ) \\ &= (2\pi) \left( \frac{1}{2} \int\_{0}^{\infty} u \mathbf{e}^{-u} \mathrm{d}u \right) = (2\pi) \left( \frac{1}{2} \right) = \pi. \end{split}$$

Thus the first diagonal element in *Y*˜*Y*˜ <sup>∗</sup> integrates out to *π<sup>p</sup>* and, similarly, each diagonal element will integrate out to *πp*, which is canceled by the term *π<sup>p</sup>* present in the normalizing constant. Hence the integral over *Y*˜*Y*˜ <sup>∗</sup> gives an identity matrix and the covariance matrix of *X*˜ is *Σ*, the other parameter appearing in the density (3.1*a*.1). Hence the two parameters therein are the mean value vector and the covariance matrix of *X*˜ .

**Example 3.1a.1.** Consider the matrix *Σ* and the vector *X*˜ with expected value *E*[*X*˜ ]= ˜*μ* as follows:

$$E = \begin{bmatrix} 2 & 1+i \\ 1-i & 3 \end{bmatrix}, \ \tilde{X} = \begin{bmatrix} \tilde{X}\_1 \\ \tilde{X}\_2 \end{bmatrix}, \ E[\tilde{X}] = \begin{bmatrix} 1+2i \\ 2-i \end{bmatrix} = \tilde{\mu}.$$

Show that *Σ* is Hermitian positive definite so that it can be a covariance matrix of *X*˜ , that is, Cov*(X)*˜ = *Σ*. If *X*˜ has a bivariate Gaussian distribution in the complex domain; *X*˜ ∼ *N*˜2*(μ, Σ), Σ > O* ˜ , then write down (1) the exponent in the density explicitly; (2) the density explicitly.

**Solution 3.1a.1.** The transpose and conjugate transpose of *Σ* are

$$
\Sigma' = \begin{bmatrix} 2 & 1-i \\ 1+i & 3 \end{bmatrix}, \ \Sigma^\* = \bar{\Sigma}' = \begin{bmatrix} 2 & 1+i \\ 1-i & 3 \end{bmatrix} = \Sigma^\*
$$

and hence *Σ* is Hermitian. The eigenvalues of *Σ* are available from the equation

$$\begin{aligned} (2 - \lambda)(3 - \lambda) - (1 - i)(1 + i) &= 0 \Rightarrow \lambda^2 - 5\lambda + 4 = 0\\ \Rightarrow (\lambda - 4)(\lambda - 1) \text{ or } \lambda\_1 &= 4, \ \lambda\_2 = 1. \end{aligned}$$

Thus, the eigenvalues are positive [the eigenvalues of a Hermitian matrix will always be real]. This property of eigenvalues being positive, combined with the property that *Σ* is Hermitian proves that *Σ* is Hermitian positive definite. This can also be established from the leading minors of *Σ*. The leading minors are det*((*2*))* = 2 *>* 0 and det*(Σ)* = *(*2*)(*3*)* − *(*1 − *i)(*1 + *i)* = 4 *>* 0. Since *Σ* is Hermitian and its leading minors are all positive, *Σ* is positive definite. Let us evaluate the inverse by making use of the formula *<sup>Σ</sup>*−<sup>1</sup> <sup>=</sup> <sup>1</sup> det*(Σ)(*Cof*(Σ))* where Cof*(Σ)* represents the matrix of cofactors of the elements in *Σ*. [These formulae hold whether the elements in the matrix are real or complex]. That is,

$$
\Sigma^{-1} = \frac{1}{4} \begin{bmatrix} 3 & -(1+i) \\ -(1-i) & 2 \end{bmatrix}, \ \Sigma^{-1} \Sigma = I. \tag{ii}
$$

The exponent in a bivariate complex Gaussian density being <sup>−</sup>*(X*˜ − ˜*μ)*<sup>∗</sup> *<sup>Σ</sup>*−1*(X*˜ − ˜*μ)*, we have

$$\begin{split}-(\tilde{X}-\tilde{\mu})^{\*}\,\Sigma^{-1}(\tilde{X}-\tilde{\mu})&=-\frac{1}{4}\{\Im\left[\tilde{\chi}\_{1}-(1+2i)\right]^{\*}[\tilde{\chi}\_{1}-(1+2i)]\\ &-(1+i)\,[\tilde{\chi}\_{1}-(1+2i)]^{\*}[\tilde{\chi}\_{2}-(2-i)]\\ &-(1-i)\,[\tilde{\chi}\_{2}-(2-i)]^{\*}[\tilde{\chi}\_{1}-(1+2i)]\\ &+2\,[\tilde{\chi}\_{2}-(2-i)]^{\*}[\tilde{\chi}\_{2}-(2-i)]\}.\end{split}\tag{iii}$$

Thus, the density of the *N*˜2*(μ, Σ)* ˜ vector whose components can assume any complex value is

$$f(\tilde{X}) = \frac{\mathbf{e}^{-(\tilde{X}-\tilde{\mu})^\* \Sigma^{-1}(\tilde{X}-\tilde{\mu})}}{4\pi^2} \tag{3.1a.2}$$

where *Σ*−<sup>1</sup> is given in *(ii)* and the exponent, in *(iii)*.

#### **Exercises 3.1**

**3.1.1.** Construct a 2×2 Hermitian positive definite matrix *A* and write down a Hermitian form with this *A* as its matrix.

**3.1.2.** Construct a 2 × 2 Hermitian matrix *B* where the determinant is 4, the trace is 5, and first row is 2*,* 1 + *i*. Then write down explicitly the Hermitian form *X*∗*BX*.

**3.1.3.** Is *B* in Exercise 3.1.2 positive definite? Is the Hermitian form *X*∗*BX* positive definite? Establish the results.

**3.1.4.** Construct two 2 × 2 Hermitian matrices *A* and *B* such that *AB* = *O* (null), if that is possible.

**3.1.5.** Specify the eigenvalues of the matrix *B* in Exercise 3.1.2, obtain a unitary matrix *Q, QQ*<sup>∗</sup> = *I, Q*∗*Q* = *I* such that *Q*∗*BQ* is diagonal and write down the canonical form for a Hermitian form *X*∗*BX* = *λ*1|*y*1| <sup>2</sup> <sup>+</sup> *<sup>λ</sup>*2|*y*2<sup>|</sup> 2.

#### **3.2. The Multivariate Normal or Gaussian Distribution, Real Case**

We may define a real *p*-variate Gaussian density via the following characterization: Let *x*1*, .., xp* be real scalar variables and *X* be a *p* × 1 vector with *x*1*,...,xp* as its elements, that is, *X*- = *(x*1*,...,xp)*. Let *L*- = *(a*1*,...,ap)* where *a*1*,...,ap* are arbitrary real scalar constants. Consider the linear function *u* = *L*- *X* = *X*- *L* = *a*1*x*<sup>1</sup> +···+ *apxp*. If, for all possible *L, u* = *L*- *X* has a real univariate Gaussian distribution, then the vector *X* is said to have a multivariate Gaussian distribution. For any linear function *u* = *L*- *X, E*[*u*] = *L*- *E*[*X*] = *L*- *μ, μ*- = *(μ*1*,...,μp)*, *μj* = *E*[*xj* ]*, j* = 1*,...,p*, and Var*(u)* = *L*- *ΣL, Σ* = Cov*(X)* = *E*[*X* − *E(X)*][*X* − *E(X)*] in the real case. If *u* is univariate normal then its mgf, with parameter *t*, is the following:

$$M\_{\mathfrak{u}}(t) = E[\mathfrak{e}^{t\mathfrak{u}}] = \mathfrak{e}^{tE(\mathfrak{u}) + \frac{t^2}{2}\mathrm{Var}(\mathfrak{u})} = \mathfrak{e}^{tL'\mathfrak{u} + \frac{t^2}{2}L'\Sigma L}.$$

Note that *tL*- *<sup>μ</sup>* <sup>+</sup> *<sup>t</sup>*<sup>2</sup> <sup>2</sup> *L*- *ΣL* = *(tL)*- *<sup>μ</sup>* <sup>+</sup> <sup>1</sup> <sup>2</sup> *(tL)*- *Σ(tL)* where there are *p* parameters *a*1*,...,ap* when the *aj* 's are arbitrary. As well, *tL* contains only *p* parameters as, for example, *taj* is a single parameter when both *t* and *aj* are arbitrary. Then,

$$M\_{\mathbf{u}}(t) = M\_{\mathbf{X}}(tL) = \mathbf{e}^{(tL)'\mu + \frac{1}{2}(tL)'\Sigma(tL)} = \mathbf{e}^{T'\mu + \frac{1}{2}T'\Sigma T} = M\_{\mathbf{X}}(T),\ T = tL. \tag{3.2.1}$$

Thus, when *L* is arbitrary, the mgf of *u* qualifies to be the mgf of a *p*-vector *X*. The density corresponding to (3.2.1) is the following, when *Σ>O*:

$$f(X) = c \operatorname{e}^{-\frac{1}{2}(X-\mu)'\Sigma^{-1}(X-\mu)}, \ -\infty < \mathbf{x}\_j < \infty, \ -\infty < \mu\_j < \infty, \ \Sigma > 0$$

for *j* = 1*,...,p*. We can evaluate the normalizing constant *c* when *f (X)* is a density, in which case the total integral is unity. That is,

$$1 = \int\_X f(X)dX = \int\_X c \text{ e}^{-\frac{1}{2}(X-\mu)'\Sigma^{-1}(X-\mu)} \text{d}X.$$

Let *Σ*−<sup>1</sup> <sup>2</sup> *(X* − *μ)* = *Y* ⇒ d*Y* = |*Σ*| −1 <sup>2</sup> d*(X* − *μ)* = |*Σ*| −1 <sup>2</sup> d*X* since *μ* is a constant. The Jacobian of the transformation may be obtained from Theorem 1.6.1. Now,

$$1 = c|\Sigma|^{\frac{1}{2}} \int\_Y \mathbf{e}^{-\frac{1}{2}Y'Y} \mathrm{d}Y.$$

But *Y* - *<sup>Y</sup>* <sup>=</sup> *<sup>y</sup>*<sup>2</sup> <sup>1</sup> +···+*y*<sup>2</sup> *<sup>p</sup>* where *y*1*,...,yp* are the real elements in *Y* and # <sup>∞</sup> −∞ <sup>e</sup> −1 2 *y*2 *<sup>j</sup>* <sup>d</sup>*yj* <sup>=</sup> <sup>√</sup>2*π*. Hence # *<sup>Y</sup>* <sup>e</sup>−<sup>1</sup> <sup>2</sup>*Y* - *<sup>Y</sup>* <sup>d</sup>*<sup>Y</sup>* <sup>=</sup> *(* <sup>√</sup>2*π )p*. Then *<sup>c</sup>* = [|*Σ*<sup>|</sup> 1 <sup>2</sup> *(*2*π ) <sup>p</sup>* 2 ] <sup>−</sup><sup>1</sup> and the *p*-variate real Gaussian or normal density is given by

$$f(X) = \frac{1}{|\Sigma|^{\frac{1}{2}}(2\pi)^{\frac{p}{2}}} \mathbf{e}^{-\frac{1}{2}(X-\mu)'\Sigma^{-1}(X-\mu)}\tag{3.2.2}$$

for *Σ > O,* −∞ *< xj <* ∞*,* −∞ *< μj <* ∞*, j* = 1*,...,p*. The density (3.2.2) is called the nonsingular normal density in the real case—nonsingular in the sense that *Σ* is nonsingular. In fact, *Σ* is also real positive definite in the nonsingular case. When *Σ* is singular, we have a singular normal distribution which does not have a density function. However, in the singular case, all the properties can be studied with the help of the associated mgf which is of the form in (3.2.1), as the mgf exists whether *Σ* is nonsingular or singular.

We will use the standard notation *X* ∼ *Np(μ, Σ)* to denote a *p*-variate real normal or Gaussian distribution with mean value vector *μ* and covariance matrix *Σ*. If it is nonsingular real Gaussian, we write *Σ>O*; if it is singular normal, then we specify |*Σ*| = 0. If we wish to combine the singular and nonsingular cases, we write *X* ∼ *Np(μ, Σ), Σ* ≥ *O*.

What are the mean value vector and the covariance matrix of a real *p*-Gaussian vector *X*?

$$\begin{split} E[X] &= E[X - \mu] + E[\mu] = \mu + \int\_{X} (X - \mu) f(X) \mathrm{d}X \\ &= \mu + \frac{1}{|\Sigma|^{\frac{1}{2}} (2\pi)^{\frac{p}{2}}} \int\_{X} (X - \mu) \, \mathrm{e}^{-\frac{1}{2} (X - \mu)' \Sigma^{-1} (X - \mu)} \mathrm{d}X \\ &= \mu + \frac{\Sigma^{\frac{1}{2}}}{(2\pi)^{\frac{p}{2}}} \int\_{Y} Y \, \mathrm{e}^{-\frac{1}{2} Y' \mathrm{d}Y} \mathrm{d}Y, \; Y = \Sigma^{-\frac{1}{2}} (X - \mu). \end{split}$$

The expected value of a matrix is the matrix of the expected value of every element in the matrix. The expected value of the component *yj* of *Y* -= *(y*1*,...,yp)* is

$$E[\mathbf{y}\_j] = \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{\infty} \mathbf{y}\_j \mathbf{e}^{-\frac{1}{2}\mathbf{y}\_j^2} \mathbf{d} \mathbf{y}\_j \left\{ \prod\_{i \neq j=1}^p \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{\infty} \mathbf{e}^{-\frac{1}{2}\mathbf{y}\_i^2} \mathbf{d} \mathbf{y}\_i \right\}.$$

The product is equal to 1 and the first integrand being an odd function of *yj* , it is equal to 0 since integral is convergent. Thus, *E*[*Y* ] = *O* (a null vector) and *E*[*X*] = *μ*, the first parameter appearing in the exponent of the density. Now, consider the covariance matrix of *X*. For a vector real *X*,

$$\begin{split} \text{Cov}(X) &= E[X - E(X)][X - E(X)]' = E[(X - \mu)(X - \mu)'] \\ &= \frac{1}{|\Sigma|^{\frac{1}{2}}(2\pi)^{\frac{p}{2}}} \int\_X (X - \mu)(X - \mu)' \mathbf{e}^{-\frac{1}{2}(X - \mu)'\Sigma^{-1}(X - \mu)} \mathbf{d}X \\ &= \frac{1}{(2\pi)^{\frac{p}{2}}} \Sigma^{\frac{1}{2}} \Big[ \int\_X YY' \mathbf{e}^{-\frac{1}{2}Y'Y} \mathbf{d}Y \Big] \Sigma^{\frac{1}{2}}, \ Y = \Sigma^{-\frac{1}{2}}(X - \mu). \end{split}$$

But

$$YY' = \begin{bmatrix} \mathbf{y\_1} \\ \vdots \\ \mathbf{y\_p} \end{bmatrix} \begin{bmatrix} \mathbf{y\_1}, \dots, \mathbf{y\_p} \end{bmatrix} = \begin{bmatrix} \mathbf{y\_1^2} & \mathbf{y\_1}\mathbf{y\_2} & \cdots & \mathbf{y\_1}\mathbf{y\_p} \\ \mathbf{y\_2}\mathbf{y\_1} & \mathbf{y\_2^2} & \cdots & \mathbf{y\_2}\mathbf{y\_p} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{y\_p}\mathbf{y\_1} & \mathbf{y\_p}\mathbf{y\_2} & \cdots & \mathbf{y\_p^2} \end{bmatrix}.$$

The non-diagonal elements are linear in each variable *yi* and *yj* , *i* = *j* and hence the integrals over the non-diagonal elements will be equal to zero due to a property of convergent integrals over odd functions. Hence we only need to consider the diagonal elements. When considering *y*1, the integrals over *y*2*,...,yp* will give the following:

$$\int\_{-\infty}^{\infty} \mathbf{e}^{-\frac{1}{2}y\_j^2} \mathbf{d}y = \sqrt{2\pi}, \ j = 2, \dots, p \Rightarrow (2\pi)^{\frac{p-1}{2}}$$

and hence we are left with

$$\frac{1}{\sqrt{2\pi}}\int\_{-\infty}^{\infty} \mathbf{y}\_{\mathrm{l}}^{2} \mathbf{e}^{-\frac{1}{2}\mathbf{y}\_{\mathrm{l}}^{2}} \mathrm{d}\mathbf{y}\_{\mathrm{l}} = \frac{2}{\sqrt{2\pi}}\int\_{0}^{\infty} \mathbf{y}\_{\mathrm{l}}^{2} \mathbf{e}^{-\frac{1}{2}\mathbf{y}\_{\mathrm{l}}^{2}} \mathrm{d}\mathbf{y}\_{\mathrm{l}}$$

due to evenness of the integrand, the integral being convergent. Let *<sup>u</sup>* <sup>=</sup> *<sup>y</sup>*<sup>2</sup> <sup>1</sup> so that *y*<sup>1</sup> = *u* 1 2 since *<sup>y</sup>*<sup>1</sup> *<sup>&</sup>gt;* 0. Then d*y*<sup>1</sup> <sup>=</sup> <sup>1</sup> 2*u* 1 <sup>2</sup><sup>−</sup>1d*u*. The integral is available as *Γ (* <sup>3</sup> <sup>2</sup> *)*2 3 <sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup>*Γ (* <sup>1</sup> <sup>2</sup> *)*2 3 <sup>2</sup> <sup>=</sup> <sup>√</sup>2*<sup>π</sup>* since *Γ (* <sup>1</sup> <sup>2</sup> *)* <sup>=</sup> <sup>√</sup>*π*, and the constant is canceled leaving 1. This shows that each diagonal element integrates out to 1 and hence the integral over *Y Y* is the identity matrix after absorbing *(*2*π )*−*<sup>p</sup>* <sup>2</sup> . Thus Cov*(X)* = *Σ* 1 <sup>2</sup>*Σ* 1 <sup>2</sup> = *Σ* the inverse of which is the other parameter appearing in the exponent of the density. Hence the two parameters are

$$
\mu = E\{X\} \text{ and } \Sigma = \text{Cov}(X). \tag{3.2.3}
$$

#### **The bivariate case**

When *p* = 2*,* we obtain the bivariate real normal density from (3.2.2), which is denoted by *f (x*1*, x*2*)*. Note that when *p* = 2,

$$\begin{aligned} \left( (X - \mu)' \Sigma^{-1} (X - \mu) = (\mathbf{x}\_1 - \mu\_1, \mathbf{x}\_2 - \mu\_2) \right) \Sigma^{-1} \begin{pmatrix} \mathbf{x}\_1 - \mu\_1 \\ \mathbf{x}\_2 - \mu\_2 \end{pmatrix}, \\ \Sigma = \begin{pmatrix} \sigma\_{11} & \sigma\_{12} \\ \sigma\_{21} & \sigma\_{22} \end{pmatrix} = \begin{pmatrix} \sigma\_1^2 & \sigma\_1 \sigma\_2 \rho \\ \sigma\_1 \sigma\_2 \rho & \sigma\_2^2 \end{pmatrix}, \end{aligned}$$

where *σ*<sup>2</sup> <sup>1</sup> <sup>=</sup> Var*(x*1*)* <sup>=</sup> *<sup>σ</sup>*11*, σ*<sup>2</sup> <sup>2</sup> = Ver*(x*2*)* = *σ*22*, σ*<sup>12</sup> = Cov*(x*1*, x*2*)* = *σ*1*σ*2*ρ* where *ρ* is the correlation between *x*<sup>1</sup> and *x*2, and *ρ*, in general, is defined as

$$\rho = \frac{\text{Cov}(\mathbf{x}\_1, \mathbf{x}\_2)}{\sqrt{\text{Var}(\mathbf{x}\_1)\text{Var}(\mathbf{x}\_2)}} = \frac{\sigma\_{12}}{\sigma\_1 \sigma\_2}, \ \sigma\_1 \neq 0, \ \sigma\_2 \neq 0,$$

which means that *ρ* is defined only for non-degenerate random variables, or equivalently, that the probability mass of either variable should not lie at a single point. This *ρ* is a scalefree covariance, the covariance measuring the joint variation in *(x*1*, x*2*)* corresponding to the square of scatter, Var*(x)*, in a real scalar random variable *x*. The covariance, in general, depends upon the units of measurements of *x*<sup>1</sup> and *x*2, whereas *ρ* is a scale-free pure coefficient. This *ρ* does not measure relationship between *x*<sup>1</sup> and *x*<sup>2</sup> for −1 *<ρ<* 1. But for *ρ* = ±1 it can measure linear relationship. Oftentimes, *ρ* is misinterpreted as measuring any relationship between *x*<sup>1</sup> and *x*2, which is not the case as can be seen from the counterexamples pointed out in Mathai and Haubold (2017). If *ρx,y* is the correlation between two real scalar random variables *x* and *y* and if *u* = *a*1*x* + *b*<sup>1</sup> and *v* = *a*2*y* + *b*<sup>2</sup> where *a*<sup>1</sup> = 0*, a*<sup>2</sup> = 0 and *b*1*, b*<sup>2</sup> are constants, then *ρu,v* = ±*ρx,y*. It is positive when *a*<sup>1</sup> *>* 0*, a*<sup>2</sup> *>* 0 or *a*<sup>1</sup> *<* 0*, a*<sup>2</sup> *<* 0 and negative otherwise. Thus, *ρ* is both location and scale invariant.

The determinant of *Σ* in the bivariate case is

$$|\Sigma| = \left| \begin{array}{cc} \sigma\_1^2 & \rho \sigma\_1 \sigma\_2 \\ \rho \sigma\_1 \sigma\_2 & \sigma\_2^2 \end{array} \right| = \sigma\_1^2 \sigma\_2^2 (1 - \rho^2), \ -1 < \rho < 1, \ \sigma\_1 > 0, \ \sigma\_2 > 0.$$

The inverse is as follows, taking the inverse as the transpose of the matrix of cofactors divided by the determinant:

$$\Sigma^{-1} = \frac{1}{\sigma\_1^2 \sigma\_2^2 (1 - \rho^2)} \begin{bmatrix} \sigma\_2^2 & -\rho \sigma\_1 \sigma\_2 \\ -\rho \sigma\_1 \sigma\_2 & \sigma\_1^2 \end{bmatrix} = \frac{1}{1 - \rho^2} \begin{bmatrix} \frac{1}{\sigma\_1^2} & -\frac{\rho}{\sigma\_1 \sigma\_2} \\ -\frac{\rho}{\sigma\_1 \sigma\_2} & \frac{1}{\sigma\_2^2} \end{bmatrix}. \tag{3.2.4}$$

Then,

$$(X - \mu)^{\prime} \Sigma^{-1} (X - \mu) = \left(\frac{\mathbf{x}\_1 - \mu\_1}{\sigma\_1}\right)^2 + \left(\frac{\mathbf{x}\_2 - \mu\_2}{\sigma\_2}\right)^2 - 2\rho \left(\frac{\mathbf{x}\_1 - \mu\_1}{\sigma\_1}\right) \left(\frac{\mathbf{x}\_2 - \mu\_2}{\sigma\_2}\right) \equiv \mathcal{Q} \cdot \mathbf{x} \tag{3.2.5}$$

Hence, the real bivariate normal density is

$$f(\mathbf{x}\_1, \mathbf{x}\_2) = \frac{1}{2\pi\sigma\_1\sigma\_2\sqrt{(1-\rho^2)}}\mathbf{e}^{-\frac{\mathcal{Q}}{2(1-\rho^2)}}\tag{3.2.6}$$

where *Q* is given in (3.2.5). Observe that *Q* is a positive definite quadratic form and hence *Q >* 0 for all *X* and *μ*. We can also obtain an interesting result on the standardized variables of *<sup>x</sup>*<sup>1</sup> and *<sup>x</sup>*2. Let the standardized *xj* be *yj* <sup>=</sup> *xj*−*μj σj , j* = 1*,* 2 and *u* = *y*<sup>1</sup> − *y*2. Then

$$\text{Var}(\mu) = \text{Var}(\mathbf{y}\_1) + \text{Var}(\mathbf{y}\_2) - 2\text{Cov}(\mathbf{y}\_1, \mathbf{y}\_2) = 1 + 1 - 2\rho = 2(1 - \rho). \tag{3.2.7}$$

This shows that the smaller the absolute value of *ρ* is, the larger the variance of *u*, and vice versa, noting that −1 *<ρ<* 1 in the bivariate real normal case but in general, −1 ≤ *ρ* ≤ 1. Observe that if *ρ* = 0 in the bivariate normal density given in (3.2.6), this joint density factorizes into the product of the marginal densities of *x*<sup>1</sup> and *x*2*,* which implies that *x*<sup>1</sup> and *x*<sup>2</sup> are independently distributed when *ρ* = 0. In general, for real scalar random variables *x* and *y*, *ρ* = 0 need not imply independence; however, in the bivariate normal case, *ρ* = 0 if and only if *x*<sup>1</sup> and *x*<sup>2</sup> are independently distributed. As well, the exponent in (3.2.6) has the following feature:

$$Q = (X - \mu)' \Sigma^{-1} (X - \mu) = \left(\frac{\mathbf{x}\_1 - \mu\_1}{\sigma\_1}\right)^2 + \left(\frac{\mathbf{x}\_2 - \mu\_2}{\sigma\_2}\right)^2 - 2\rho \left(\frac{\mathbf{x}\_1 - \mu\_1}{\sigma\_1}\right) \left(\frac{\mathbf{x}\_2 - \mu\_2}{\sigma\_2}\right) = c' \tag{3.2.8}$$

where c is positive describes an ellipse in two-dimensional Euclidean space, and for a general *p*,

$$(X - \mu)' \Sigma^{-1} (X - \mu) = c > 0, \ \Sigma > O,\tag{3.2.9}$$

describes the surface of an ellipsoid in the *p*-dimensional Euclidean space, observing that *Σ*−<sup>1</sup> *> O* when *Σ>O*.

**Example 3.2.1.** Let

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \; \mu = \begin{bmatrix} 1 \\ -1 \\ -2 \end{bmatrix}, \; \Sigma = \begin{bmatrix} 3 & 0 & -1 \\ 0 & 3 & 1 \\ -1 & 1 & 2 \end{bmatrix}.$$

Show that *Σ>O* and that *Σ* can be a covariance matrix for *X*. Taking *E*[*X*] = *μ* and Cov*(X)* = *Σ,* construct the exponent of a trivariate real Gaussian density explicitly and write down the density.

**Solution 3.2.1.** Let us verify the definiteness of *Σ*. Note that *Σ* = *Σ*- (symmetric). The leading minors are |*(*3*)*| = 3 *>* 0*,* 3 0 0 3 <sup>=</sup> <sup>9</sup> *<sup>&</sup>gt;* <sup>0</sup>*,* <sup>|</sup>*Σ*| = <sup>12</sup> *<sup>&</sup>gt;* 0, and hence *Σ>O*. The matrix of cofactors of *Σ*, that is, Cof*(Σ)* and the inverse of *Σ* are the following:

$$\text{Cof}(\Sigma) = \begin{bmatrix} \mathfrak{S} & -1 & 3 \\ -1 & \mathfrak{S} & -3 \\ 3 & -3 & 9 \end{bmatrix}, \ \Sigma^{-1} = \frac{1}{12} \begin{bmatrix} \mathfrak{S} & -1 & 3 \\ -1 & \mathfrak{S} & -3 \\ 3 & -3 & 9 \end{bmatrix}. \tag{6}$$

Thus the exponent of the trivariate real Gaussian density is <sup>−</sup><sup>1</sup> <sup>2</sup>*Q* where

$$\mathcal{Q} = \frac{1}{12} [\mathbf{x}\_1 - 1, \mathbf{x}\_2 + 1, \mathbf{x}\_3 + 2] \begin{bmatrix} 5 & -1 & 3 \\ -1 & 5 & -3 \\ 3 & -3 & 9 \end{bmatrix} \begin{bmatrix} \mathbf{x}\_1 - 1 \\ \mathbf{x}\_2 + 1 \\ \mathbf{x}\_3 + 2 \end{bmatrix}$$

$$= \frac{1}{12} [\mathbf{5}(\mathbf{x}\_1 - 1)^2 + \mathbf{5}(\mathbf{x}\_2 + 1)^2 + \mathbf{9}(\mathbf{x}\_3 + 2)^2 - 2(\mathbf{x}\_1 - 1)(\mathbf{x}\_2 + 1)$$

$$+ \mathbf{6}(\mathbf{x}\_1 - 1)(\mathbf{x}\_3 + 2) - \mathbf{6}(\mathbf{x}\_2 + 1)(\mathbf{x}\_3 + 2)].\tag{ii}$$

The normalizing constant of the density being

$$(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}} = (2\pi)^{\frac{3}{2}}[12]^{\frac{1}{2}} = 2^{\frac{5}{2}}\sqrt{3}\pi^{\frac{3}{2}},$$

the resulting trivariate Gaussian density is

$$f(X) = [2^{\frac{3}{2}} \sqrt{3} \pi^{\frac{3}{2}}]^{-1} e^{-\frac{1}{2}\mathcal{Q}}$$

for −∞ *< xj <* ∞*, j* = 1*,* 2*,* 3*,* where *Q* is specified in *(ii)*.

#### **3.2.1. The moment generating function in the real case**

We have defined the multivariate Gaussian distribution via the following characterization whose proof relies on its moment generating function: if all the possible linear combinations of the components of a random vector are real univariate normal, then this vector must follow a real multivariate Gaussian distribution. We are now looking into the derivation of the mgf given the density. For a parameter vector *T* , with *T* - = *(t*1*,...,tp)*, we have

$$\begin{split} M\_X(T) &= E[\mathbf{e}^{T^{\prime X}}] = \int\_X \mathbf{e}^{T^{\prime X}} f(X) \mathbf{d}X = \mathbf{e}^{T^{\prime}\mu} E[\mathbf{e}^{T^{\prime}(X-\mu)}] \\ &= \frac{\mathbf{e}^{T^{\prime}\mu}}{|\Sigma|^{\frac{1}{2}}(2\pi)^{\frac{p}{2}}} \int\_X \mathbf{e}^{T^{\prime}(X-\mu) - \frac{1}{2}(X-\mu)^{\prime}\Sigma^{-1}(X-\mu)} \mathbf{d}X. \end{split}$$

Observe that the moment generating function (mgf) in the real multivariate case is the expected value of e raised to a linear function of the real scalar variables. Making the transformation *<sup>Y</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup> *(X* − *μ)* ⇒ d*Y* = |*Σ*| −1 <sup>2</sup> d*X*. The exponent can be simplified as follows:

$$\begin{aligned} T'(X - \mu) - \frac{1}{2}(X - \mu)'\Sigma^{-1}(X - \mu) &= -\frac{1}{2} \{-2T'\Sigma^{\frac{1}{2}}Y + Y'Y\}, \\ &= -\frac{1}{2} \{(Y - \Sigma^{\frac{1}{2}}T)'(Y - \Sigma^{\frac{1}{2}}T) - T'\Sigma T\}. \end{aligned}$$

Hence

$$M\_X(T) = \mathbf{e}^{T'\mu + \frac{1}{2}T'\Sigma T} \frac{1}{(2\pi)^{\frac{p}{2}}} \int\_Y \mathbf{e}^{-\frac{1}{2}(Y - \Sigma^{\frac{1}{2}}T)'(Y - \Sigma^{\frac{1}{2}}T)} \mathbf{d}Y.$$

The integral over *Y* is 1 since this is the total integral of a multivariate normal density whose mean value vector is *Σ* 1 <sup>2</sup> *T* and covariance matrix is the identity matrix. Thus the mgf of a multivariate real Gaussian vector is

$$M\_X(T) = \mathbf{e}^{T'\mu + \frac{1}{2}T'\Sigma T}.\tag{3.2.10}$$

In the singular normal case, we can still take (3.2.10) as the mgf for *Σ* ≥ *O* (non-negative definite), which encompasses the singular and nonsingular cases. Then, one can study properties of the normal distribution whether singular or nonsingular via (3.2.10).

We will now apply the differential operator *<sup>∂</sup> ∂T* defined in Sect. 1.7 on the moment generating function of a *p* × 1 real normal random vector *X* and evaluate the result at *T* = *O* to obtain the mean value vector of this distribution, that is, *μ* = *E*[*X*]. As well, *E*[*XX*- ] is available by applying the operator *<sup>∂</sup> ∂T ∂ ∂T* on the mgf, and so on. From the mgf in (3.2.10), we have

$$\begin{split} \frac{\partial}{\partial T} M\_X(T)|\_{T=O} &= \frac{\partial}{\partial T} \mathbf{e}^{T/\mu + \frac{1}{2}T'\Sigma T}|\_{T=O} \\ &= [\mathbf{e}^{T'\mu + \frac{1}{2}T'\Sigma T} [\mu + \Sigma T]|\_{T=O}] \Rightarrow \mu = E[X]. \end{split} \tag{i}$$

Then,

$$\frac{\partial}{\partial T'} M\_X(T) = \mathbf{e}^{T'\mu + \frac{1}{2}T'\Sigma T} [\mu' + T'\Sigma]. \tag{ii}$$

Remember to write the scalar quantity, *MX(T )*, on the left for scalar multiplication of matrices. Now,

$$\begin{split} \frac{\partial}{\partial T} \frac{\partial}{\partial T'} M\_X(T) &= \frac{\partial}{\partial T} \mathbf{e}^{T'\mu + \frac{1}{2}T'\Sigma T} [\mu' + T'\Sigma] \\ &= M\_X(T)[\mu + \Sigma T][\mu' + T'\Sigma] + M\_X(T)[\Sigma]. \end{split}$$

Hence,

$$E[XX'] = \left[\frac{\partial}{\partial T}\frac{\partial}{\partial T'}M\_X(T)|\_{T=O}\right] = \Sigma + \mu\mu'.\tag{iii}$$

But

$$\text{Cov}(X) = E[XX'] - E[X]E[X'] = (\Sigma + \mu\mu') - \mu\mu' = \Sigma. \tag{\text{iv}}$$

In the multivariate real Gaussian case, we have only two parameters *μ* and *Σ* and both of these are available from the above equations. In the general case, we can evaluate higher moments as follows:

$$E[\cdots \cdots X^\prime X X^\prime] = \cdots \cdot \frac{\partial}{\partial T^\prime} \frac{\partial}{\partial T} M\_X(T)|\_{T=O} \,. \tag{\text{v}}$$

If the characteristic function *φX(T )*, which is available from the mgf by replacing *T* by *iT, i* <sup>=</sup> <sup>√</sup>*(*−1*),* is utilized, then multiply the left-hand side of *(v)* by *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)* with each operator operating on *φX(T )* because *φX(T )* = *MX(iT )*. The corresponding differential operators can also be developed for the complex case.

Given a real *p*-vector *X* ∼ *Np(μ, Σ), Σ > O*, what will be the distribution of a linear function of *X*? Let *u* = *L*- *X, X* ∼ *Np(μ, Σ), Σ > O, L*- = *(a*1*,...,ap)* where *a*1*,...,ap* are real scalar constants. Let us examine its mgf whose argument is a real scalar parameter *t*. The mgf of *u* is available by integrating out over the density of *X*. We have

$$M\_{\mathfrak{u}}(t) = E[\mathbf{e}^{t\mu}] = E[\mathbf{e}^{tL'X}] = E[\mathbf{e}^{(tL')X}].$$

This is of the same form as in (3.2.10) and hence, *Mu(t)* is available from (3.2.10) by replacing *T* by *(tL*- *)*, that is,

$$M\_{\mu}(t) = \mathbf{e}^{t(L'\mu) + \frac{t^2}{2}L'\Sigma L} \Rightarrow \mu \sim N\_1(L'\mu, L'\Sigma L). \tag{3.2.11}$$

This means that *u* is a univariate normal with mean value *L*- *μ* = *E*[*u*] and the variance of *L*- *ΣL* = Var*(u)*. Now, let us consider a set of linearly independent linear functions of *X*. Let *A* be a real *q* × *p, q* ≤ *p* matrix of full rank *q* and let the linear functions *U* = *AX* where *U* is *q* × 1. Then *E*[*U*] = *AE*[*X*] = *Aμ* and the covariance matrix in *U* is

$$\begin{split} \text{Cov}(U) &= E[U - E(U)][U - E(U)]' = E[A(X - \mu)(X - \mu)'A'] \\ &= AE[(X - \mu)(X - \mu)']A' = A\,\,\Sigma A'. \end{split}$$

Observe that since *Σ>O*, we can write *Σ* = *Σ*1*Σ*- <sup>1</sup> so that *AΣA*- = *(AΣ*1*)(AΣ*1*)*- and *AΣ*<sup>1</sup> is of full rank which means that *AΣA*- *> O*. Therefore, letting *T* be a *q* × 1 parameter vector, we have

$$M\_U(T) = E[\mathbf{e}^{T'U}] = E[\mathbf{e}^{T'AX}] = E[\mathbf{e}^{(T'A)X}],$$

which is available from (3.2.10). That is,

$$M\_U(T) = \mathbf{e}^{T'A\mu + \frac{1}{2}(T'A\Sigma A'T)} \Rightarrow U \sim N\_q(A\mu, A\Sigma A').$$

Thus *U* is a *q*-variate multivariate normal with parameters *Aμ* and *AΣA* and we have the following result:

**Theorem 3.2.1.** *Let the vector random variable X have a real p-variate nonsingular Np(μ, Σ) distribution and the q* × *p matrix A with q* ≤ *p, be a full rank constant matrix. Then*

$$U = AX \sim N\_q(A\mu, A\Sigma A'), \ A\Sigma A > O. \tag{3.2.12}$$

**Corollary 3.2.1.** *Let the vector random variable X have a real p-variate nonsingular Np(μ, Σ) distribution and B be a* 1 × *p constant vector. Then U*<sup>1</sup> = *BX has a univariate normal distribution with parameters Bμ and BΣB*- *.*

**Example 3.2.2.** Let *X, μ* = *E*[*X*]*, Σ* = Cov*(X), Y,* and *A* be as follows:

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \; \mu = \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix}, \; \Sigma = \begin{bmatrix} 4 & -2 & 0 \\ -2 & 3 & 1 \\ 0 & 1 & 2 \end{bmatrix}, \; Y = \begin{bmatrix} y\_1 \\ y\_2 \end{bmatrix}.$$

Let *y*<sup>1</sup> = *x*<sup>1</sup> + *x*<sup>2</sup> + *x*<sup>3</sup> and *y*<sup>2</sup> = *x*<sup>1</sup> − *x*<sup>2</sup> + *x*<sup>3</sup> and write *Y* = *AX*. If *Σ>O* and if *X* ∼ *N*3*(μ, Σ)*, derive the density of (1) *Y* ; (2) *y*<sup>1</sup> directly as well as from (1).

**Solution 3.2.2.** The leading minors of *Σ* are |*(*4*)*| = 4 *>* 0*,* 4 −2 −2 3 <sup>=</sup> <sup>8</sup> *<sup>&</sup>gt;* 0*,* |*Σ*| = 12 *>* 0 and *Σ* = *Σ*- . Being symmetric and positive definite, *Σ* is a bona fide covariance matrix. Now, *Y* = *AX* where

$$A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & -1 & 1 \end{bmatrix};$$

$$E[Y] = AE[X] = A \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 \\ 1 & -1 & 1 \end{bmatrix} \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix};\tag{6}$$

$$\mathbb{T} = \begin{bmatrix} 4 & 2 & 0 \end{bmatrix} \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}$$

$$\operatorname{Cov}(Y) = A \operatorname{Cov}(X) A^\prime = \begin{bmatrix} 1 & 1 & 1 \\ 1 & -1 & 1 \end{bmatrix} \begin{bmatrix} 4 & -2 & 0 \\ -2 & 3 & 1 \\ 0 & 1 & 2 \end{bmatrix} \begin{bmatrix} 1 & 1 & 1 \\ 1 & -1 & \\ 1 & 1 & 1 \end{bmatrix} = \begin{bmatrix} 7 & 3 \\ 3 & 11 \end{bmatrix}. \quad \text{(ii)}$$

Since *A* is of full rank (rank 2) and *y*<sup>1</sup> and *y*<sup>2</sup> are linear functions of the real Gaussian vector *X*, *Y* has a bivariate nonsingular real Gaussian distribution with parameters *E(Y )* and Cov*(Y )*. Since

$$
\begin{bmatrix} 7 & 3 \\ 3 & 11 \end{bmatrix}^{-1} = \frac{1}{68} \begin{bmatrix} 11 & -3 \\ -3 & 7 \end{bmatrix},
$$

the density of *<sup>Y</sup>* has the exponent <sup>−</sup><sup>1</sup> <sup>2</sup>*Q* where

$$\begin{split} \mathcal{Q} &= \frac{1}{68} \left\{ \left[ \mathbf{y}\_1 - \mathbf{1}, \mathbf{y}\_2 - \mathbf{1} \right] \begin{bmatrix} 11 & -3 \\ -3 & 7 \end{bmatrix} \begin{bmatrix} \mathbf{y}\_1 - \mathbf{1} \\ \mathbf{y}\_2 - \mathbf{1} \end{bmatrix} \right\} \\ &= \frac{1}{68} \{ 11(\mathbf{y}\_1 - \mathbf{1})^2 + 7(\mathbf{y}\_2 - \mathbf{1})^2 - 6(\mathbf{y}\_1 - \mathbf{1})(\mathbf{y}\_2 - \mathbf{1}) \}. \end{split} \tag{iii}$$

The normalizing constant being *(*2*π ) <sup>p</sup>* <sup>2</sup> |*Σ*| 1 <sup>2</sup> = 2*π* <sup>√</sup><sup>68</sup> <sup>=</sup> <sup>4</sup> <sup>√</sup>17*π*, the density of *<sup>Y</sup>* , denoted by *f (Y )*, is given by

$$f(Y) = \frac{1}{4\sqrt{17}\pi} \mathbf{e}^{-\frac{1}{2}\mathcal{Q}} \tag{\text{iv}}$$

where *Q* is specified in *(iii)*. This establishes (1). For establishing (2), we first start with the formula. Let *y*<sup>1</sup> = *A*1*X* ⇒ *A*<sup>1</sup> = [1*,* 1*,* 1]*, E*[*y*1] = *A*1*E*[*X*]=[1*,* 1*,* 1] ⎡ ⎣ 2 0 −1 ⎤ <sup>⎦</sup> <sup>=</sup> <sup>1</sup> and

$$\text{Var}(\mathbf{y}\_1) = A\_1 \text{Cov}(X) A\_1' = [1, 1, 1] \begin{bmatrix} 4 & -2 & 0 \\ -2 & 3 & 1 \\ 0 & 1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} = 7.7$$

Hence *y*<sup>1</sup> ∼ *N*1*(*1*,* 7*)*. For establishing this result directly, observe that *y*<sup>1</sup> is a linear function of real normal variables and hence, it is univariate real normal with the parameters *E*[*y*1] and Var*(y*1*)*. We may also obtain the marginal distribution of *y*<sup>1</sup> directly from the parameters of the joint density of *y*<sup>1</sup> and *y*2, which are given in *(i)* and *(ii)*. Thus, (2) is also established.

The marginal distributions can also be determined from the mgf. Let us partition *T* , *μ* and *Σ* as follows:

$$T = \begin{bmatrix} T\_1 \\ T\_2 \end{bmatrix}, \ \mu = \begin{bmatrix} \mu\_{(1)} \\ \mu\_{(2)} \end{bmatrix}, \ \Sigma = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \ X = \begin{bmatrix} X\_1 \\ X\_2 \end{bmatrix} \tag{\nu}$$

where *T*1*, μ(*1*), X*<sup>1</sup> are *r* × 1 and *Σ*<sup>11</sup> is *r* × *r*. Letting *T*<sup>2</sup> = *O* (the null vector), we have

$$\begin{aligned} T'\mu + \frac{1}{2}T'\Sigma T &= [T'\_1, O'] \begin{bmatrix} \mu\_{(1)} \\ \mu\_{(2)} \end{bmatrix} + \frac{1}{2}[T'\_1, O'] \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix} \begin{bmatrix} T\_1 \\ O \end{bmatrix} \\ &= T'\_1\mu\_{(1)} + \frac{1}{2}T'\_1\Sigma\_{11}T\_1, \end{aligned}$$

which is the structure of the mgf of a real Gaussian distribution with mean value vector *E*[*X*1] = *μ(*1*)* and covariance matrix Cov*(X*1*)* = *Σ*11. Therefore *X*<sup>1</sup> is an *r*-variate real Gaussian vector and similarly, *X*<sup>2</sup> is *(p* − *r)*-variate real Gaussian vector. The standard notation used for a *p*-variate normal distribution is *X* ∼ *Np(μ, Σ), Σ* ≥ *O,* which includes the nonsingular and singular cases. In the nonsingular case, *Σ > O,* whereas |*Σ*| = 0 in the singular case.

From the mgf in (3.2.10) and *(i)* above, if we have *Σ*<sup>12</sup> = *O* with *Σ*<sup>21</sup> = *Σ*- <sup>12</sup>, then the mgf of *X* = *X*<sup>1</sup> *X*<sup>2</sup> becomes e*<sup>T</sup>* - <sup>1</sup> *μ(*1*)*+*T* - <sup>2</sup> *μ(*2*)*+<sup>1</sup> <sup>2</sup>*T* - <sup>1</sup> *<sup>Σ</sup>*11*T*1+<sup>1</sup> <sup>2</sup>*T* - <sup>2</sup> *<sup>Σ</sup>*22*T*<sup>2</sup> . That is,

$$M\_X(T) = M\_{X\_1}(T\_1) M\_{X\_2}(T\_2),$$

which implies that *X*<sup>1</sup> and *X*<sup>2</sup> are independently distributed. Hence the following result:

**Theorem 3.2.2.** *Let the real p* × 1 *vector X* ∼ *Np(μ, Σ), Σ > O, and let X be partitioned into subvectors X*<sup>1</sup> *and X*2*, with the corresponding partitioning of μ and Σ, that is,*

$$X = \begin{pmatrix} X\_1 \\ X\_2 \end{pmatrix}, \quad \mu = \begin{pmatrix} \mu\_{(1)} \\ \mu\_{(2)} \end{pmatrix}, \quad \Sigma = \begin{pmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{pmatrix}.$$

*Then, X*<sup>1</sup> *and X*<sup>2</sup> *are independently distributed if and only if Σ*<sup>12</sup> = *Σ*- <sup>21</sup> = *O.*

Observe that a covariance matrix being null need not imply independence of the subvectors; however, in the case of subvectors having a joint normal distribution, it suffices to have a null covariance matrix to conclude that the subvectors are independently distributed.

#### **3.2a. The Moment Generating Function in the Complex Case**

The determination of the mgf in the complex case is somewhat different. Take a *p*variate complex Gaussian *X*˜ ∼ *N*˜*p(μ,*˜ *Σ),* ˜ *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O*. Let *T*˜- = *(t* ˜ <sup>1</sup>*,..., t* ˜ *p)* be a parameter vector. Let *<sup>T</sup>*˜ <sup>=</sup> *<sup>T</sup>*1+*iT*2, where *<sup>T</sup>*<sup>1</sup> and *<sup>T</sup>*<sup>2</sup> are *<sup>p</sup>*×1 real vectors and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. Let *X*˜ = *X*<sup>1</sup> + *iX*<sup>2</sup> with *X*<sup>1</sup> and *X*<sup>2</sup> being real. Then consider *T*˜ <sup>∗</sup>*X*˜ = *(T* - <sup>1</sup> − *iT* - <sup>2</sup>*)(X*<sup>1</sup> + *iX*2*)* = *T* - <sup>1</sup>*X*<sup>1</sup> +*T* - <sup>2</sup>*X*<sup>2</sup> +*i(T* - <sup>1</sup>*X*<sup>2</sup> −*T* - <sup>2</sup>*X*1*)*. But *T* - <sup>1</sup>*X*<sup>1</sup> +*T* - <sup>2</sup>*X*<sup>2</sup> already contains the necessary number of parameters and all the corresponding real variables and hence to be consistent with the definition of the mgf in the real case one must take only the real part in *T*˜ <sup>∗</sup>*X*˜ . Hence the mgf in the complex case, denoted by *MX*˜*(T )*˜ , is defined as *<sup>E</sup>*[e*(T*˜ <sup>∗</sup>*X)*˜ ]. For convenience, we may take *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*˜ − ˜*<sup>μ</sup>* + ˜*μ*. Then *<sup>E</sup>*[e*(T*˜ <sup>∗</sup>*X)*˜ ] = <sup>e</sup>*(T*˜ <sup>∗</sup>*μ)*˜ *<sup>E</sup>*[e*(T*˜ <sup>∗</sup>*(X*˜ − ˜*μ))*]. On making the transformation *<sup>Y</sup>*˜ <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup> *(X*˜ − ˜*μ)*, |det*(Σ)*| appearing in the denominator of the density of *X*˜ is canceled due to the Jacobian of the transformation and we have *(X*˜ − ˜*μ)* = *Σ* 1 <sup>2</sup> *Y*˜. Thus,

$$E[\mathbf{e}^{\Re(\tilde{T}^\*\tilde{Y})}] = \frac{1}{\pi^p} \int\_{\tilde{Y}} \mathbf{e}^{\Re(\tilde{T}^\*\Sigma^{\frac{1}{2}}\tilde{Y}) - \tilde{Y}^\*\tilde{Y}} \mathrm{d}\tilde{Y}.\tag{i}$$

For evaluating the integral in *(i)*, we can utilize the following result which will be stated here as a lemma.

The Multivariate Gaussian and Related Distributions

**Lemma 3.2a.1.** *Let U*˜ *and V*˜ *be two p* × 1 *vectors in the complex domain. Then*

$$
\mathcal{Z}\Re(\tilde{U}^\*\tilde{V}) = \mathcal{U}^\*\tilde{V} + \tilde{V}^\*\tilde{U} = \mathcal{Z}\Re(\tilde{V}^\*\tilde{U}).
$$

*Proof***:** Let *U*˜ = *U*<sup>1</sup> + *iU*2*, V*˜ = *V*<sup>1</sup> + *iV*<sup>2</sup> where *U*1*, U*2*, V*1*, V*<sup>2</sup> are real vectors and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. Then *<sup>U</sup>*˜ <sup>∗</sup>*V*˜ = [*U*- <sup>1</sup> − *iU*- <sup>2</sup>][*V*<sup>1</sup> + *iV*2] = *U*- <sup>1</sup>*V*<sup>1</sup> + *U*- <sup>2</sup>*V*<sup>2</sup> + *i*[*U*- <sup>1</sup>*V*<sup>2</sup> − *U*- <sup>2</sup>*V*1]. Similarly *V*˜ <sup>∗</sup>*U*˜ = *V* - <sup>1</sup>*U*<sup>1</sup> + *V* - <sup>2</sup>*U*<sup>2</sup> + *i*[*V* - <sup>1</sup>*U*<sup>2</sup> − *V* - <sup>2</sup>*U*1]. Observe that since *U*1*, U*2*, V*1*, V*<sup>2</sup> are real, we have *U*- *<sup>i</sup>Vj* = *V* - *<sup>j</sup>Ui* for all *i* and *j* . Hence, the sum *U*˜ <sup>∗</sup>*V*˜ + *V*˜ <sup>∗</sup>*U*˜ = 2[*U*- <sup>1</sup>*V*<sup>1</sup> + *U*- <sup>2</sup>*V*2] = 2 *(V*˜ <sup>∗</sup>*U )*˜ . This completes the proof.

Now, the exponent in *(i)* can be written as

$$\Re(\tilde{T}^\*\Sigma^{\frac{1}{2}}\tilde{Y}) = \frac{1}{2}\tilde{T}^\*\Sigma^{\frac{1}{2}}\tilde{Y} + \frac{1}{2}\tilde{Y}^\*\Sigma^{\frac{1}{2}}\tilde{T}$$

by using Lemma 3.2a.1, observing that *Σ* = *Σ*∗. Let us expand *(Y*˜ − *C)*∗*(Y*˜ − *C)* as *Y*˜ <sup>∗</sup>*Y*˜ − *Y*˜ <sup>∗</sup>*C* − *C*∗*Y*˜ + *C*∗*C* for some *C*. Comparing with the exponent in *(i),* we may take *<sup>C</sup>*<sup>∗</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup>*T*˜ <sup>∗</sup>*Σ* 1 <sup>2</sup> so that *<sup>C</sup>*∗*<sup>C</sup>* <sup>=</sup> <sup>1</sup> <sup>4</sup>*T*˜ <sup>∗</sup>*ΣT*˜. Therefore in the complex Gaussian case, the mgf is

$$M\_{\tilde{X}}(\tilde{T}) = \mathbf{e}^{\Re(\tilde{T}^\*\tilde{\mu}) + \frac{1}{4}\tilde{T}^\*\Sigma\tilde{T}}.\tag{3.2a.1}$$

**Example 3.2a.1.** Let *X, E* ˜ [*X*˜ ]= ˜*μ,* Cov*(X)*˜ = *Σ* be the following where *X*˜ ∼ *N*˜2*(μ, Σ), Σ > O* ˜ ,

$$
\tilde{X} = \begin{bmatrix} \tilde{X}\_1 \\ \tilde{X}\_2 \end{bmatrix}, \quad \tilde{\mu} = \begin{bmatrix} 1 - i \\ 2 - 3i \end{bmatrix}, \ \Sigma = \begin{bmatrix} 3 & 1 + i \\ 1 - i & 2 \end{bmatrix} \dots
$$

Compute the mgf of *X*˜ explicitly.

**Solution 3.2a.1.** Let *T*˜ = *t* ˜ 1 *t* ˜ 2 where let *t* ˜ <sup>1</sup> = *t*<sup>11</sup> + *it*12*, t* ˜ <sup>2</sup> = *t*<sup>21</sup> + *it*<sup>22</sup> with *t*11*, t*12*, t*21*, t*<sup>22</sup> being real scalar parameters. The mgf of *X*˜ is

$$M\_{\tilde{X}}(\tilde{T}) = \mathbf{e}^{\Re(\tilde{T}^\*\tilde{\mu}) + \frac{1}{4}\tilde{T}^\*\Sigma\tilde{T}}.$$

Consider the first term in the exponent of the mgf:

$$\begin{split} \Re(\tilde{T}^\* \tilde{\mu}) &= \Re \Big\{ [t\_{11} - it\_{12}, t\_{21} - it\_{22}] \begin{bmatrix} 1 - i \\ 2 - 3i \end{bmatrix} \Big\} \\ &= \Re \{ (t\_{11} - it\_{12})(1 - i) + (t\_{21} - it\_{22})(2 - 3i) \} \\ &= t\_{11} - t\_{12} + 2t\_{21} - 3t\_{22} .\end{split}$$

The second term in the exponent is the following:

$$\begin{split} \frac{1}{4} \tilde{T}^\* \Sigma \tilde{T} &= \frac{1}{4} \left\{ [\tilde{t}\_1^\*, \tilde{t}\_2^\*] \begin{bmatrix} 3 & 1+i \\ 1-i & 2 \end{bmatrix} \begin{bmatrix} \tilde{t}\_1 \\ \tilde{t}\_2 \end{bmatrix} \right\} \\ &= \frac{1}{4} \{ 3\tilde{t}\_1^\* \tilde{t}\_1 + 2\tilde{t}\_2^\* \tilde{t}\_2 + (1+i)\tilde{t}\_1^\* \tilde{t}\_2 + (1-i)\tilde{t}\_2^\* \tilde{t}\_1 \}. \end{split}$$

Note that since the parameters are scalar quantities, the conjugate transpose means only the conjugate or *t* ˜∗ *<sup>j</sup>* = ¯ *t* ˜ *<sup>j</sup> , j* = 1*,* 2. Let us look at the non-diagonal terms. Note that [*(*1 + *i)t* ˜∗ 1 *t* ˜ <sup>2</sup>]+[*(*1 − *i)t* ˜∗ 2 *t* ˜ <sup>1</sup>] gives 2*(t*11*t*<sup>21</sup> + *t*12*t*<sup>22</sup> + *t*12*t*<sup>21</sup> − *t*11*t*22*)*. However, *t* ˜∗ 1 *t* ˜ 1 = *t*2 <sup>11</sup> <sup>+</sup> *<sup>t</sup>*<sup>2</sup> <sup>12</sup>*, t* ˜∗ 2 *t* ˜ <sup>2</sup> <sup>=</sup> *<sup>t</sup>*<sup>2</sup> <sup>21</sup> <sup>+</sup> *<sup>t</sup>*<sup>2</sup> <sup>22</sup>. Hence if the exponent of *MX*˜*(t)*˜ is denoted by *φ*,

$$\phi = \left[t\_{11} - t\_{12} + 2t\_{21} - 3t\_{22}\right] + \frac{1}{4} \{3(t\_{11}^2 + t\_{12}^2) + 2(t\_{21}^2 + t\_{22}^2)$$

$$+ 2(t\_{11}t\_{21} + t\_{12}t\_{22} + t\_{12}t\_{21} - t\_{11}t\_{22})\}.\tag{i}$$

Thus the mgf is

$$M\_{\tilde{X}}(\tilde{T}) = \mathbf{e}^{\phi}$$

where *φ* is given in *(i)*.

#### **3.2a.1. Moments from the moment generating function**

We can also derive the moments from the mgf of (3.2*a*.1) by operating with the differential operator of Sect. 1.7 of Chap. 1. For the complex case, the operator *<sup>∂</sup> ∂X*1 in the real case has to be modified. Let *X*˜ = *X*<sup>1</sup> + *iX*<sup>2</sup> be a *p* × 1 vector in the complex domain where *<sup>X</sup>*<sup>1</sup> and *<sup>X</sup>*<sup>2</sup> are real and *<sup>p</sup>* <sup>×</sup> 1 and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. Then in the complex domain the differential operator is

$$\frac{\partial}{\partial \tilde{X}} = \frac{\partial}{\partial X\_1} + i \frac{\partial}{\partial X\_2}. \tag{ii}$$

Let *T*˜ = *T*<sup>1</sup> + *iT*2*, μ*˜ = *μ(*1*)* + *iμ(*2*), Σ* = *Σ*<sup>1</sup> + *iΣ*<sup>2</sup> where *T*1*, T*2*, μ(*1*), μ(*2*), Σ*1*, Σ*<sup>2</sup> are all real and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*, *<sup>Σ</sup>*<sup>1</sup> <sup>=</sup> *<sup>Σ</sup>*- <sup>1</sup>*,* and *Σ*- <sup>2</sup> = −*Σ*<sup>2</sup> because *Σ* is Hermitian. Note that *T*˜ <sup>∗</sup>*ΣT*˜ = *(T* - <sup>1</sup> − *iT* - <sup>2</sup>*)Σ(T*<sup>1</sup> + *iT*2*)* = *T* - <sup>1</sup>*ΣT*<sup>1</sup> + *T* - <sup>2</sup>*ΣT*<sup>2</sup> + *i(T* - <sup>1</sup>*ΣT*<sup>2</sup> − *T* - <sup>2</sup>*ΣT*1*)*, and observe that

$$T\_j^{\prime} \Sigma T\_j = T\_j^{\prime} (\Sigma\_1 + i \,\Sigma\_2) T\_j = T\_j^{\prime} \,\Sigma\_1 T\_j + 0 \tag{iiii}$$

for *j* = 1*,* 2 since *Σ*<sup>2</sup> is skew symmetric. The exponent in the mgf in (3.2*a*.1) can be simplified as follows: Letting *u* denote the exponent in the mgf and observing that [*T*˜ <sup>∗</sup>*ΣT*˜] <sup>∗</sup> = *T*˜ <sup>∗</sup>*ΣT*˜ is real,

The Multivariate Gaussian and Related Distributions

$$\begin{split} \mu &= \Re(\tilde{T}^\* \tilde{\mu}) + \frac{1}{4} \tilde{T}^\* \Sigma \tilde{T} = \Re(T\_1' - iT\_2')(\mu\_{(1)} + i\mu\_{(2)}) + \frac{1}{4}(T\_1' - iT\_2')\Sigma (T\_1 + iT\_2) \\ &= T\_1' \mu\_{(1)} + T\_2' \mu\_{(2)} + \frac{1}{4} [T\_1' \Sigma T\_1 + T\_2' \Sigma T\_2] + \frac{1}{4} u\_1, \; u\_1 = i(T\_1' \Sigma T\_2 - T\_2' \Sigma T\_1) \\ &= T\_1' \mu\_{(1)} + T\_2' \mu\_{(2)} + \frac{1}{4} [T\_1' \Sigma\_1 T\_1 + T\_2' \Sigma\_1 T\_2] + \frac{1}{4} u\_1. \end{split}$$

In this last line, we have made use of the result in *(iii)*. The following lemma will enable us to simplify *u*1.

**Lemma 3.2a.2.** *Let T*<sup>1</sup> *and T*<sup>2</sup> *be real p*×1 *vectors. Let the p*×*p matrix Σ be Hermitian, Σ* = *Σ*<sup>∗</sup> = *Σ*<sup>1</sup> + *iΣ*2*, with Σ*<sup>1</sup> = *Σ*- <sup>1</sup> *and Σ*<sup>2</sup> = −*Σ*- <sup>2</sup>*. Then*

$$\begin{aligned} u\_1 &= i(T\_1' \,\Sigma \, T\_2 - T\_2' \,\Sigma \, T\_1) = -2T\_1' \,\Sigma\_2 T\_2 = 2T\_2' \,\Sigma\_2 T\_1\\ &\Rightarrow \frac{\partial}{\partial T\_1} u\_1 = -2\,\Sigma\_2 T\_2 \,\, \text{and} \quad \frac{\partial}{\partial T\_2} u\_1 = 2\,\Sigma\_2 T\_1. \end{aligned} \tag{\nu}$$

*Proof***:** This result will be established by making use of the following general properties: For a 1 × 1 matrix, the transpose is itself whereas the conjugate transpose is the conjugate of the same quantity. That is, *(a* + *ib)*- = *a* + *ib, (a* + *ib)*<sup>∗</sup> = *a* − *ib* and if the conjugate transpose is equal to itself then the quantity is real or equivalently, if *(a*+*ib)* = *(a*+*ib)*<sup>∗</sup> = *a* − *ib* then *b* = 0 and the quantity is real. Thus,

$$\begin{split} \mu\_{1} &= i(T\_{1}'\Sigma T\_{2} - T\_{2}'\Sigma T\_{1}) = i[T\_{1}'(\Sigma\_{1} + i\Sigma\_{2})T\_{2} - T\_{2}'(\Sigma\_{1} + i\Sigma\_{2})T\_{1}], \\ &= iT\_{1}'\Sigma\_{1}T\_{2} - T\_{1}'\Sigma\_{2}T\_{2} - iT\_{2}'\Sigma\_{1}T\_{1} + T\_{2}'\Sigma\_{2}T\_{1} = -T\_{1}'\Sigma\_{2}T\_{2} + T\_{2}'\Sigma\_{2}T\_{1} \\ &= -2T\_{1}'\Sigma\_{2}T\_{2} = 2T\_{2}'\Sigma\_{2}T\_{1}. \end{split} \tag{\text{vì}}$$

The following properties were utilized: *T* - *<sup>i</sup> Σ*1*Tj* = *T* - *<sup>j</sup>Σ*1*Ti* for all *i* and *j* since *Σ*<sup>1</sup> is a symmetric matrix, the quantity is 1 × 1 and real and hence, the transpose is itself; *T* - *<sup>i</sup> Σ*2*Tj* = −*T* - *<sup>j</sup>Σ*2*Ti* for all *i* and *j* because the quantities are 1×1 and then, the transpose is itself, but the transpose of *Σ*- <sup>2</sup> = −*Σ*2. This completes the proof.

Now, let us apply the operator *( <sup>∂</sup> ∂T*<sup>1</sup> <sup>+</sup> *<sup>i</sup> <sup>∂</sup> ∂T*2 *)* to the mgf in (3.2*a*.1) and determine the various quantities. Note that in light of results stated in Chap. 1, we have

$$\begin{split} \frac{\partial}{\partial T\_{1}}(T\_{1}^{'}\Sigma\_{1}T\_{1}) &= 2\Sigma\_{1}T\_{1}, \; \frac{\partial}{\partial T\_{1}}(-2T\_{1}^{'}\Sigma\_{2}T\_{2}) = -2\Sigma\_{2}T\_{2}, \; \frac{\partial}{\partial T\_{1}}\mathfrak{R}(\tilde{T}^{\*}\tilde{\mu}) = \mu\_{(1)},\\ \frac{\partial}{\partial T\_{2}}(T\_{2}^{'}\Sigma\_{1}T\_{2}) &= 2\Sigma\_{1}T\_{2}, \; \frac{\partial}{\partial T\_{2}}(2T\_{2}^{'}\Sigma\_{2}T\_{1}) = 2\Sigma\_{2}T\_{1}, \; \frac{\partial}{\partial T\_{2}}\mathfrak{R}(\tilde{T}^{\*}\tilde{\mu}) = \mu\_{(2)}. \end{split}$$

Thus, given *(ii)*–*(vi)*, the operator applied to the exponent of the mgf gives the following result:

$$\begin{split} \left(\frac{\partial}{\partial T\_1} + i \frac{\partial}{\partial T\_2}\right) \mu &= \mu\_{(1)} + i \mu\_{(2)} + \frac{1}{4} [2\Sigma\_1 T\_1 - 2\Sigma\_2 T\_2 + 2\Sigma\_1 i T\_2 + 2\Sigma\_2 i T\_1] \\ &= \tilde{\mu} + \frac{1}{4} [2(\Sigma\_1 + i \Sigma\_2) T\_1 + 2(\Sigma\_1 + i \Sigma\_2) i T\_2 = \tilde{\mu} + \frac{1}{4} [2\Sigma \tilde{T}] = \tilde{\mu} + \frac{1}{2} \Sigma \tilde{T} .\end{split}$$

so that

$$\begin{split} \frac{\partial}{\partial \tilde{T}} M\_{\tilde{X}}(\tilde{T})|\_{\tilde{T}=O} &= \left(\frac{\partial}{\partial T\_1} + i \frac{\partial}{\partial T\_2}\right) M\_{\tilde{X}}(\tilde{T})|\_{\tilde{T}=O} \\ &= [M\_{\tilde{X}}(\tilde{T})|\tilde{\mu} + \frac{1}{2} \tilde{\Sigma} \tilde{T}]|\_{T\_1=O, T\_2=O} = \tilde{\mu}, \end{split} \tag{vii}$$

noting that *T*˜ = *O* implies that *T*<sup>1</sup> = *O* and *T*<sup>2</sup> = *O*. For convenience, let us denote the operator by

$$\frac{\partial}{\partial \tilde{T}} = \left(\frac{\partial}{\partial T\_1} + i \frac{\partial}{\partial T\_2}\right).$$

From *(vii)*, we have

$$\begin{aligned} \frac{\partial}{\partial \tilde{T}} M\_{\tilde{X}}(\tilde{T}) &= M\_{\tilde{X}}(\tilde{T})[\tilde{\mu} + \frac{1}{2} \Sigma \tilde{T}], \\ \frac{\partial}{\partial \tilde{T}^\*} M\_{\tilde{X}}(\tilde{T}) &= [\tilde{\mu}^\* + \frac{1}{2} \tilde{T}^\* \tilde{\Sigma}] M\_{\tilde{X}}(\tilde{T}). \end{aligned}$$

Now, observe that

$$\begin{aligned} \tilde{T}^\*\Sigma &= (T\_1' - iT\_2')\Sigma = T\_1'\Sigma - iT\_2'\Sigma \Rightarrow \\ \frac{\partial}{\partial T\_1}(\tilde{T}^\*\Sigma) &= \Sigma, \ \frac{\partial}{\partial T\_2}(\tilde{T}^\*\Sigma) = -i\Sigma, \\ \left(\frac{\partial}{\partial T\_1} + i\frac{\partial}{\partial T\_2}\right)(\tilde{T}^\*\Sigma) &= \Sigma - i(i)\Sigma = 2\Sigma, \end{aligned}$$

and

$$\frac{\partial}{\partial \tilde{T}} \frac{\partial}{\partial \tilde{T}^\*} M\_{\tilde{X}}(\tilde{T})|\_{\tilde{T}=O} = \tilde{\mu}\tilde{\mu}^\* + \tilde{\Sigma}.$$

Thus,

$$\frac{\partial}{\partial \tilde{T}} M\_{\tilde{X}}(\tilde{T})|\_{\tilde{T}=O} = \tilde{\mu} \text{ and } \frac{\partial}{\partial \tilde{T}} \frac{\partial}{\partial \tilde{T}^\*} M\_{\tilde{X}}(\tilde{T})|\_{\tilde{T}=O} = \tilde{\Sigma} + \tilde{\mu}\tilde{\mu}^\*,$$

and then Cov*(X)*˜ = *Σ*˜ . In general, for higher order moments, one would have

$$E[\cdots \cdot \tilde{X}^\* \tilde{X} \tilde{X}^\*] = \cdots \cdot \frac{\partial}{\partial \tilde{T}^\*} \frac{\partial}{\partial \tilde{T}} \frac{\partial}{\partial \tilde{T}^\*} M\_{\tilde{X}}(\tilde{T})|\_{\tilde{T} = O} \cdots$$

#### **3.2a.2. Linear functions**

Let *w*˜ = *L*∗*X*˜ where *L*<sup>∗</sup> = *(a*1*,...,ap)* and *a*1*,...,ap* are scalar constants, real or complex. Then the mgf of *w*˜ can be evaluated by integrating out over the *p*-variate complex Gaussian density of *X*˜ . That is,

$$M\_{\tilde{w}}(\tilde{t}) = E[\mathbf{e}^{\mathfrak{R}(\tilde{t}\tilde{w})}] = E[\mathbf{e}^{(\mathfrak{R}(\tilde{t}L^\*\tilde{X}))}].\tag{3.2a.2}$$

Note that this expected value is available from (3.2*a*.1) by replacing *T*˜ <sup>∗</sup> by *tL*˜ <sup>∗</sup>. Hence

$$M\_{\tilde{w}}(\tilde{t}) = \mathbf{e}^{\Re(\tilde{t}(L^\*\tilde{\mu})) + \frac{1}{4}\tilde{t}\tilde{t}^\*(L^\*\Sigma L)}. \tag{3.2a.3}$$

Then from (2.1a.1), *w*˜ = *L*∗*X*˜ is univariate complex Gaussian with the parameters *L*∗*μ*˜ and *L*∗*ΣL*. We now consider several such linear functions: Let *Y*˜ = *AX*˜ where *A* is *q* ×*p, q* ≤ *p* and of full rank *q*. The distribution of *Y*˜ can be determined as follows. Since *Y*˜ is a function of *X*˜ , we can evaluate the mgf of *Y*˜ by integrating out over the density of *X*˜ . Since *Y*˜ is *q* × 1, let us take a *q* × 1 parameter vector *U*˜ . Then,

$$M\_{\tilde{Y}}(\tilde{U}) = E[\mathbf{e}^{\Re(\tilde{U}^\*\tilde{Y})}] = E[\mathbf{e}^{\Re(\tilde{U}^\*A\tilde{X})}] = E[\mathbf{e}^{\Re[(\tilde{U}^\*A)\tilde{X}]}].\tag{3.2a.4}$$

On comparing this expected value with (3.2*a*.1), we can write down the mgf of *Y*˜ as the following:

$$M\_{\tilde{Y}}(\tilde{U}) = \mathbf{e}^{\Re(\tilde{U}^\* A \tilde{\mu}) + \frac{1}{4}(\tilde{U}^\* A)\varSigma(A^\* \tilde{U})} = \mathbf{e}^{\Re(\tilde{U}^\* (A\tilde{\mu})) + \frac{1}{4}\tilde{U}^\* (A\varSigma A^\*)\tilde{U}},\tag{3.2a.5}$$

which means that *Y*˜ has a *q*-variate complex Gaussian distribution with the parameters *A μ*˜ and *AΣA*∗. Thus, we have the following result:

**Theorem 3.2a.1.** *Let X*˜ ∼ *N*˜*p(μ, Σ), Σ > O* ˜ *be a p-variate nonsingular complex normal vector. Let A be a q* × *p, q* ≤ *p, constant real or complex matrix of full rank q. Let Y*˜ = *AX*˜ *. Then,*

$$
\bar{Y} \sim \bar{N}\_q(A \,\tilde{\mu}, A \,\Sigma A^\*), \,\, A \,\Sigma A^\* > O. \tag{3.2a.6}
$$

Let us consider the following partitioning of *T ,*˜ *X, Σ* ˜ where *T*˜ is *p* × 1, *T*˜ <sup>1</sup> is *r* × 1, *r* ≤ *p*, *X*˜ <sup>1</sup> is *r* × 1, *Σ*<sup>11</sup> is *r* × *r*, *μ*˜ *(*1*)* is *r* × 1:

$$
\tilde{T} = \begin{bmatrix} \tilde{T}\_1 \\ \tilde{T}\_2 \end{bmatrix}, \ \tilde{T}\_1 = \begin{bmatrix} \tilde{t}\_1 \\ \vdots \\ \tilde{t}\_r \end{bmatrix}, \tilde{X} = \begin{bmatrix} \tilde{X}\_1 \\ \tilde{X}\_2 \end{bmatrix}, \ \Sigma = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \ \tilde{\mu} = \begin{bmatrix} \tilde{\mu}\_{(1)} \\ \tilde{\mu}\_{(2)} \end{bmatrix}.
$$

Let *T*˜ <sup>2</sup> = *O*. Then the mgf of *X*˜ becomes that of *X*˜ <sup>1</sup> as

$$[\tilde{T}\_1^\*, O] \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix} \begin{bmatrix} \tilde{T}\_1 \\ O \end{bmatrix} = \tilde{T}\_1^\* \Sigma\_{11} \tilde{T}\_1.$$

Thus the mgf of *X*˜ <sup>1</sup> becomes

$$M\_{\tilde{X}\_1}(\tilde{T}\_1) = \mathbf{e}^{\Re(\tilde{T}\_1^\* \tilde{\mu}\_{(1)}) + \frac{1}{4} \tilde{T}\_1^\* \Sigma\_{11} \tilde{T}\_1}.\tag{3.2a.7}$$

This is the mgf of the *r* ×1 subvector *X*˜ <sup>1</sup> and hence *X*˜ <sup>1</sup> has an *r*-variate complex Gaussian density with the mean value vector *μ*˜ *(*1*)* and the covariance matrix *Σ*11. In a real or complex Gaussian vector, the individual variables can be permuted among themselves with the corresponding permutations in the mean value vector and the covariance matrix. Hence, all subsets of components of *X*˜ are Gaussian distributed. Thus, any set of *r* components of *X*˜ is again a complex Gaussian for *r* = 1*,* 2*,...,p* when *X*˜ is a *p*-variate complex Gaussian.

Suppose that, in the mgf of (3.2*a*.1), *Σ*<sup>12</sup> = *O* where *X*˜ ∼ *N*˜*p(μ, Σ), Σ > O* ˜ and

$$
\tilde{X} = \begin{pmatrix} \tilde{X}\_1 \\ \tilde{X}\_2 \end{pmatrix}, \quad \tilde{\mu} = \begin{pmatrix} \tilde{\mu}\_{(1)} \\ \tilde{\mu}\_{(2)} \end{pmatrix}, \quad \Sigma = \begin{pmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{pmatrix}, \quad \tilde{T} = \begin{pmatrix} \tilde{T}\_1 \\ \tilde{T}\_2 \end{pmatrix}.
$$

When *Σ*<sup>12</sup> is null, so is *Σ*<sup>21</sup> since *Σ*<sup>21</sup> = *Σ*<sup>∗</sup> <sup>12</sup>. Then *Σ* = *Σ*<sup>11</sup> *O O Σ*22 is block-diagonal. As well, *(T*˜ <sup>∗</sup>*μ)*˜ = *(T*˜ <sup>∗</sup> <sup>1</sup> *μ*˜ *(*1*))* + *(T*˜ <sup>∗</sup> <sup>2</sup> *μ*˜ *(*2*))* and

$$
\tilde{T}^\* \Sigma \tilde{T} = (\tilde{T}\_1^\*, \tilde{T}\_2^\*) \begin{pmatrix} \Sigma\_{11} & O \\ O & \Sigma\_{22} \end{pmatrix} \begin{pmatrix} \tilde{T}\_1 \\ \tilde{T}\_2 \end{pmatrix} = \tilde{T}\_1^\* \Sigma\_{11} \tilde{T}\_1 + \tilde{T}\_2^\* \Sigma\_{22} \tilde{T}\_2. \tag{i}
$$

In other words, *MX*˜*(T )*˜ becomes the product of the the mgf of *X*˜ <sup>1</sup> and the mgf of *X*˜ 2, that is, *X*˜ <sup>1</sup> and *X*˜ <sup>2</sup> are independently distributed whenever *Σ*<sup>12</sup> = *O*.

**Theorem 3.2a.2.** *Let X*˜ ∼ *N*˜*p(μ, Σ), Σ > O* ˜ *, be a nonsingular complex Gaussian vector. Consider the partitioning of X,* ˜ *μ,*˜ *T, Σ* ˜ *as in (i) above. Then, the subvectors X*˜ <sup>1</sup> *and X*˜ <sup>2</sup> *are independently distributed as complex Gaussian vectors if and only if Σ*<sup>12</sup> = *O or equivalently, Σ*<sup>21</sup> = *O.*

#### **Exercises 3.2**

**3.2.1.** Construct a 2 × 2 real positive definite matrix *A*. Then write down a bivariate real Gaussian density where the covariance matrix is this *A*.

**3.2.2.** Construct a 2×2 Hermitian positive definite matrix *B* and then construct a complex bivariate Gaussian density. Write the exponent and normalizing constant explicitly.

**3.2.3.** Construct a 3×3 real positive definite matrix *A*. Then create a real trivariate Gaussian density with this *A* being the covariance matrix. Write down the exponent and the normalizing constant explicitly.

**3.2.4.** Repeat Exercise 3.2.3 for the complex Gaussian case.

**3.2.5.** Let the *p*×1 real vector random variable have a *p*-variate real nonsingular Gaussian density *X* ∼ *Np(μ, Σ), Σ > O*. Let *L* be a *p*×1 constant vector. Let *u* = *L*- *X* = *X*- *L* = a linear function of *X*. Show that *E*[*u*] = *L*- *μ,* Var*(u)* = *L*- *ΣL* and that *u* is a univariate Gaussian with the parameters *L*- *μ* and *L*- *ΣL*.

**3.2.6.** Show that the mgf of *u* in Exercise 3.2.5 is

$$M\_{\mu}(t) = \mathbf{e}^{t(L'\mu) + \frac{t^2}{2}L'\Sigma L}.$$

**3.2.7.** What are the corresponding results in Exercises 3.2.5 and 3.2.6 for the nonsingular complex Gaussian case?

**3.2.8.** Let *X* ∼ *Np(O, Σ), Σ > O*, be a real *p*-variate nonsingular Gaussian vector. Let *u*<sup>1</sup> = *X*- *<sup>Σ</sup>*−1*X,* and *<sup>u</sup>*<sup>2</sup> <sup>=</sup> *<sup>X</sup>*- *X*. Derive the densities of *u*<sup>1</sup> and *u*2.

**3.2.9.** Establish Theorem 3.2.1 by using transformation of variables [Hint: Augment the matrix *A* with a matrix *B* such that *C* = *A B* is *p*×*p* and nonsingular. Derive the density of *Y* = *CX*, and therefrom, the marginal density of *AX*.]

**3.2.10.** By constructing counter examples or otherwise, show the following: Let the real scalar random variables *<sup>x</sup>*<sup>1</sup> and *<sup>x</sup>*<sup>2</sup> be such that *<sup>x</sup>*<sup>1</sup> <sup>∼</sup> *<sup>N</sup>*1*(μ*1*, σ*<sup>2</sup> <sup>1</sup> *), σ*<sup>1</sup> *>* 0*, x*<sup>2</sup> ∼ *N*1*(μ*2*, σ*<sup>2</sup> <sup>2</sup> *), σ*<sup>2</sup> *>* 0 and Cov*(x*1*, x*2*)* = 0. Then, the joint density need not be bivariate normal.

**3.2.11.** Generalize Exercise 3.2.10 to *p*-vectors *X*<sup>1</sup> and *X*2.

**3.2.12.** Extend Exercises 3.2.10 and 3.2.11 to the complex domain.

#### **3.3. Marginal and Conditional Densities, Real Case**

Let the *p* × 1 vector have a real *p*-variate Gaussian distribution *X* ∼ *Np(μ, Σ), Σ > O*. Let *X, μ* and *Σ* be partitioned as the following:

$$X = \begin{bmatrix} \boldsymbol{x}\_1 \\ \vdots \\ \boldsymbol{x}\_p \end{bmatrix} = \begin{bmatrix} X\_1 \\ X\_2 \end{bmatrix}, \ \boldsymbol{\mu} = \begin{bmatrix} \mu\_{(1)} \\ \mu\_{(2)} \end{bmatrix}, \ \boldsymbol{\Sigma}^{-1} = \begin{bmatrix} \boldsymbol{\Sigma}^{11} & \boldsymbol{\Sigma}^{12} \\ \boldsymbol{\Sigma}^{21} & \boldsymbol{\Sigma}^{22} \end{bmatrix}$$

where *<sup>X</sup>*<sup>1</sup> and *μ(*1*)* are *<sup>r</sup>* <sup>×</sup> 1, *<sup>X</sup>*<sup>2</sup> and *μ(*2*)* are *(p* <sup>−</sup> *r)* <sup>×</sup> 1, *<sup>Σ</sup>*<sup>11</sup> is *<sup>r</sup>* <sup>×</sup> *<sup>r</sup>*, and so on. Then

$$\begin{split} ((X-\mu)'\Sigma^{-1}(X-\mu) = [(X\_1-\mu\_{(1)})', (X\_2-\mu\_{(2)})'] \begin{bmatrix} \Sigma^{11} & \Sigma^{12} \\ \Sigma^{21} & \Sigma^{22} \end{bmatrix} \begin{bmatrix} X\_1 - \mu\_{(1)} \\ X\_2 - \mu\_{(2)} \end{bmatrix} \\ &= (X\_1 - \mu\_{(1)})'\Sigma^{11}(X\_1 - \mu\_{(1)}) + (X\_2 - \mu\_{(2)})'\Sigma^{22}(X\_2 - \mu\_{(2)}) \\ &+ (X\_1 - \mu\_{(1)})'\Sigma^{12}(X\_2 - \mu\_{(2)}) + (X\_2 - \mu\_{(2)})'\Sigma^{21}(X\_1 - \mu\_{(1)}). \end{split} \tag{1}$$

But

$$[(X\_1 - \mu\_{(1)})' \Sigma^{12} (X\_2 - \mu\_{(2)})]' = (X\_2 - \mu\_{(2)})' \Sigma^{21} (X\_1 - \mu\_{(1)})'$$

and both are real 1 × 1. Thus they are equal and we may write their sum as twice either one of them. Collecting the terms containing *X*<sup>2</sup> − *μ(*2*)*, we have

$$(X\_2 - \mu\_{(2)})' \Sigma^{22} (X\_2 - \mu\_{(2)}) + 2(X\_2 - \mu\_{(2)})' \Sigma^{21} (X\_1 - \mu\_{(1)}).\tag{ii}$$

If we expand a quadratic form of the type *(X*<sup>2</sup> − *μ(*2*)* + *C)*- *<sup>Σ</sup>*22*(X*<sup>2</sup> <sup>−</sup> *μ(*2*)* <sup>+</sup> *C)*, we have

$$\begin{split} (X\_2 - \mu\_{(2)} + C)' \Sigma^{22} (X\_2 - \mu\_{(2)} + C) &= (X\_2 - \mu\_{(2)})' \Sigma^{22} (X\_2 - \mu\_{(2)}) \\ &+ (X\_2 - \mu\_{(2)})' \Sigma^{22} C + C' \Sigma^{22} (X\_2 - \mu\_{(2)}) + C' \Sigma^{22} C. \end{split} \tag{iiii}$$

Comparing *(ii)* and *(iii)*, let

$$
\Sigma^{22}C = \Sigma^{21}(X\_\text{I} - \mu\_{\text{(I)}}) \Rightarrow C = (\Sigma^{22})^{-1}\Sigma^{21}(X\_\text{I} - \mu\_{\text{(I)}}) .
$$

Then,

$$C'\Sigma^{22}C = (X\_{\mathrm{l}} - \mu\_{\mathrm{(l)}})'\Sigma^{12}(\Sigma^{22})^{-1}\Sigma^{21}(X\_{\mathrm{l}} - \mu\_{\mathrm{(l)}}).$$

Hence,

$$\begin{split} ((X-\mu)'\Sigma^{-1}(X-\mu) = (X\_1-\mu\_{(1)})'[\Sigma^{11}-\Sigma^{12}(\Sigma^{22})^{-1}\Sigma^{21}](X\_1-\mu\_{(1)}) \\ + (X\_2-\mu\_{(2)}+C)'\Sigma^{22}(X\_2-\mu\_{(2)}+C), \end{split}$$

and after integrating out *X*2, the balance of the exponent is *(X*<sup>1</sup> − *μ(*1*))*- *Σ*−<sup>1</sup> <sup>11</sup> *(X*<sup>1</sup> − *μ(*1*))*, where *Σ*<sup>11</sup> is the *r* ×*r* leading submatrix in *Σ*; the reader may refer to Sect. 1.3 for results on the inversion of partitioned matrices. Observe that *Σ*−<sup>1</sup> <sup>11</sup> <sup>=</sup> *<sup>Σ</sup>*<sup>11</sup> <sup>−</sup> *<sup>Σ</sup>*12*(Σ*22*)*−1*Σ*21. The integral over *X*<sup>2</sup> only gives a constant and hence the marginal density of *X*<sup>1</sup> is

$$f\_{\mathbf{l}}(X\_{\mathbf{l}}) = c\_{\mathbf{l}} \operatorname{e}^{-\frac{1}{2}(X\_{\mathbf{l}} - \mu\_{(\mathbf{l})})' \Sigma\_{\mathbf{l}1}^{-1} (X\_{\mathbf{l}} - \mu\_{(\mathbf{l})})}.$$

On noting that it has the same structure as the real multivariate Gaussian density, its normalizing constant can easily be determined and the resulting density is as follows:

$$f\_1(X\_1) = \frac{1}{|\Sigma\_{11}|^{\frac{1}{2}} (2\pi)^{\frac{r}{2}}} \mathbf{e}^{-\frac{1}{2}(X\_1 - \mu\_{(1)})' \Sigma\_{11}^{-1} (X\_1 - \mu\_{(1)})}, \ \Sigma\_{11} > O,\tag{3.3.1}$$

for −∞ *< xj <* ∞*,* −∞ *< μj <* ∞*, j* = 1*,...,r*, and where *Σ*<sup>11</sup> is the covariance matrix in *X*<sup>1</sup> and *μ(*1*)* = *E*[*X*1] and *Σ*<sup>11</sup> = Cov*(X*1*)*. From symmetry, we obtain the following marginal density of *X*<sup>2</sup> in the real Gaussian case:

$$f\_2(X\_2) = \frac{1}{|\Sigma\_{22}|^{\frac{1}{2}} (2\pi)^{\frac{p-r}{2}}} \mathbf{e}^{-\frac{1}{2}(X\_2 - \mu\_{(2)})' \Sigma\_{22}^{-1} (X\_2 - \mu\_{(2)})}, \ \Sigma\_{22} > O,\tag{3.3.2}$$

for −∞ *< xj <* ∞*,* −∞ *< μj <* ∞*, j* = *r* + 1*,...,p*.

Observe that we can permute the elements in *X* as we please with the corresponding permutations in *μ* and the covariance matrix *Σ*. Hence the real Gaussian density in the *p*-variate case is a multivariate density and not a vector/matrix-variate density. From this property, it follows that every subset of the elements from *X* has a real multivariate Gaussian distribution and the individual variables have univariate real normal or Gaussian distribution. Hence our derivation of the marginal density of *X*<sup>1</sup> is a general density for a subset of *r* elements in *X* because those *r* elements can be brought to the first *r* positions through permutations of the elements in *X* with the corresponding permutations in *μ* and *Σ*.

#### **The bivariate case**

Let us look at the explicit form of the real Gaussian density for *p* = 2. In the bivariate case,

$$
\Sigma = \begin{bmatrix} \sigma\_{11} & \sigma\_{12} \\ \sigma\_{12} & \sigma\_{22} \end{bmatrix}, \begin{vmatrix} \sigma\_{11} & \sigma\_{12} \\ \sigma\_{12} & \sigma\_{22} \end{vmatrix} = \sigma\_{11}\sigma\_{22} - \left(\sigma\_{12}\right)^2.
$$

For convenience, let us denote *σ*<sup>11</sup> by *σ*<sup>2</sup> <sup>1</sup> and *<sup>σ</sup>*<sup>22</sup> by *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> . Then *σ*<sup>12</sup> = *σ*1*σ*2*ρ* where *ρ* is the correlation between *x*<sup>1</sup> and *x*1, and for *p* = 2,

$$|\Sigma| = \sigma\_1^2 \sigma\_2^2 - (\sigma\_1 \sigma\_2 \rho)^2 = \sigma\_1^2 \sigma\_2^2 (1 - \rho^2).$$

Thus, in that case,

$$\begin{split} \Sigma^{-1} = \frac{1}{|\Sigma|} [\text{cof}(\Sigma)]' &= \frac{1}{\sigma\_1^2 \sigma\_2^2 (1 - \rho^2)} \begin{bmatrix} \sigma\_2^2 & -\rho \sigma\_1 \sigma\_2 \\ -\rho \sigma\_1 \sigma\_2 & \sigma\_1^2 \end{bmatrix} \\ &= \frac{1}{1 - \rho^2} \begin{bmatrix} \frac{1}{\sigma\_1^2} & -\frac{\rho}{\sigma\_1 \sigma\_2} \\ -\frac{\rho}{\sigma\_1 \sigma\_2} & \frac{1}{\sigma\_2^2} \end{bmatrix}, \quad -1 < \rho < 1. \end{split}$$

Hence, substituting these into the general expression for the real Gaussian density and denoting the real bivariate density as *f (x*1*, x*2*)*, we have the following:

$$f(\mathbf{x}\_1, \mathbf{x}\_2) = \frac{1}{(2\pi)\sigma\_1\sigma\_2\sqrt{1-\rho^2}} \exp\left\{-\frac{1}{2(1-\rho^2)}\mathcal{Q}\right\} \tag{3.3.3}$$

where *Q* is the real positive definite quadratic form

$$\mathcal{Q} = \left(\frac{\mathbf{x}\_1 - \mu\_1}{\sigma\_1}\right)^2 - 2\rho \left(\frac{\mathbf{x}\_1 - \mu\_1}{\sigma\_1}\right) \left(\frac{\mathbf{x}\_2 - \mu\_2}{\sigma\_2}\right) + \left(\frac{\mathbf{x}\_2 - \mu\_2}{\sigma\_2}\right)^2$$

for *σ*<sup>1</sup> *>* 0*, σ*<sup>2</sup> *>* 0*,* −1 *<ρ<* 1*,* −∞ *< xj <* ∞*,* −∞ *< μj <* ∞*, j* = 1*,* 2*.*

The conditional density of *X*<sup>1</sup> given *X*2, denoted by *g*1*(X*1|*X*2*)*, is the following:

$$\begin{split} g\_1(X\_1|X\_2) &= \frac{f(X)}{f\_2(X\_2)} = \frac{|\Sigma\_{22}|^{\frac{1}{2}}}{(2\pi)^{\frac{r}{2}}|\Sigma|^{\frac{1}{2}}} \\ &\quad \times \exp\left\{-\frac{1}{2}[(X-\mu)'\Sigma^{-1}(X-\mu)-(X\_2-\mu\_{(2)})'\Sigma^{-1}\_{22}(X\_2-\mu\_{(2)})]\right\}.\end{split}$$

We can simplify the exponent, excluding <sup>−</sup><sup>1</sup> <sup>2</sup> , as follows:

$$\begin{split} ((X-\mu)'\Sigma^{-1}(X-\mu)-(X\_{2}-\mu\_{(2)})'\Sigma^{-1}\_{22}(X\_{2}-\mu\_{(2)}) \\ &= (X\_{1}-\mu\_{(1)})'\Sigma^{11}(X\_{1}-\mu\_{(1)})+2(X\_{1}-\mu\_{(1)})'\Sigma^{12}(X\_{2}-\mu\_{(2)}) \\ &+ (X\_{2}-\mu\_{(2)})'\Sigma^{22}(X\_{2}-\mu\_{(2)})-(X\_{2}-\mu\_{(2)})'\Sigma^{11}\_{22}(X\_{2}-\mu\_{(2)}). \end{split}$$

But *Σ*−<sup>1</sup> <sup>22</sup> <sup>=</sup> *<sup>Σ</sup>*<sup>22</sup> <sup>−</sup> *<sup>Σ</sup>*21*(Σ*11*)*−1*Σ*12. Hence the terms containing *<sup>Σ</sup>*<sup>22</sup> are canceled. The remaining terms containing *X*<sup>2</sup> − *μ(*2*)* are

$$2(\mathbf{X}\_1 - \mu\_{(1)})' \Sigma^{12} (\mathbf{X}\_2 - \mu\_{(2)}) + (\mathbf{X}\_2 - \mu\_{(2)})' \Sigma^{21} (\Sigma^{11})^{-1} \Sigma^{12} (\mathbf{X}\_2 - \mu\_{(2)}).$$

Combining these two terms with *(X*<sup>1</sup> −*μ(*1*))*- *<sup>Σ</sup>*11*(X*<sup>1</sup> <sup>−</sup>*μ(*1*))* results in the quadratic form *(X*<sup>1</sup> − *μ(*1*)* + *C)*- *<sup>Σ</sup>*11*(X*<sup>1</sup> <sup>−</sup> *μ(*1*)* <sup>+</sup> *C)* where *<sup>C</sup>* <sup>=</sup> *(Σ*11*)*−1*Σ*12*(X*<sup>2</sup> <sup>−</sup> *μ(*2*))*. Now, noting that 1

$$\frac{|\Sigma\_{22}|^{\frac{1}{2}}}{|\Sigma|^{\frac{1}{2}}} = \left[\frac{|\Sigma\_{22}|}{|\Sigma\_{22}|\,|\Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}|}\right]^{\frac{1}{2}} = \frac{1}{|\Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}|^{\frac{1}{2}}},$$

the conditional density of *X*<sup>1</sup> given *X*2, which is denoted by *g*1*(X*1|*X*2*)*, can be expressed as follows:

$$\begin{split} g\_1(X\_1|X\_2) &= \frac{1}{(2\pi)^{\frac{r}{2}} |\Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}|^{\frac{1}{2}}} \\ &\times \exp\{-\frac{1}{2}(X\_1 - \mu\_{(\rm 1)} + C)'(\Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21})^{-1}(X\_1 - \mu\_{(\rm 1)} + C) \end{split} \tag{3.3.4}$$

where *<sup>C</sup>* <sup>=</sup> *(Σ*11*)*−1*Σ*12*(X*<sup>2</sup> <sup>−</sup> *μ(*2*))*. Hence, the conditional expectation and covariance of *X*<sup>1</sup> given *X*<sup>2</sup> are

$$E[X\_1|X\_2] = \mu\_{\rm (l)} - C = \mu\_{\rm (l)} - (\Sigma^{11})^{-1}\Sigma^{12}(X\_2 - \mu\_{\rm (2)})$$

$$= \mu\_{\rm (l)} + \Sigma\_{12}\Sigma\_{22}^{-1}(X\_2 - \mu\_{\rm (2)}), \text{ which is linear in } X\_2.$$

$$\text{Cov}(X\_1|X\_2) = \Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}, \text{ which is free of } X\_2. \tag{3.3.5}$$

From the inverses of partitioned matrices obtained in Sect. 1.3, we have <sup>−</sup>*(Σ*11*)*−1*Σ*<sup>12</sup> <sup>=</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> , which yields the representation of the conditional expectation appearing in Eq. (3.3.5). The matrix *Σ*12*Σ*−<sup>1</sup> <sup>22</sup> is often called *the matrix of regression coefficients*. From symmetry, it follows that the conditional density of *X*2, given *X*1, denoted by *g*2*(X*2|*X*1*)*, is given by

$$\begin{split} g\_2(X\_2|X\_1) &= \frac{1}{(2\pi)^{\frac{p-r}{2}} |\Sigma\_{22} - \Sigma\_{21}\Sigma\_{11}^{-1}\Sigma\_{12}|^{\frac{1}{2}}} \\ &\times \exp\left\{ -\frac{1}{2} (X\_2 - \mu\_{(2)} + C\_1)' (\Sigma\_{22} - \Sigma\_{21}\Sigma\_{11}^{-1}\Sigma\_{12})^{-1} (X\_2 - \mu\_{(2)} + C\_1) \right\} \end{split} \tag{3.3.6}$$

where *<sup>C</sup>*<sup>1</sup> <sup>=</sup> *(Σ*22*)*−1*Σ*21*(X*<sup>1</sup> <sup>−</sup> *μ(*1*))*, and the conditional expectation and conditional variance of *X*<sup>2</sup> given *X*<sup>1</sup> are

$$\begin{aligned} E[X\_2|X\_1] &= \mu\_{(2)} - C\_1 = \mu\_{(2)} - (\Sigma^{22})^{-1}\Sigma^{21}(X\_1 - \mu\_{(1)})\\ &= \mu\_{(2)} + \Sigma\_{21}\Sigma\_{11}^{-1}(X\_1 - \mu\_{(1)}), \text{ which linear in } X\_1\\ \text{Cov}(X\_2|X\_1) &= \Sigma\_{22} - \Sigma\_{21}\Sigma\_{11}^{-1}\Sigma\_{12}, \text{ which is free of } X\_1,\end{aligned} \tag{3.3.7}$$

the matrix *Σ*21*Σ*−<sup>1</sup> <sup>11</sup> being often called *the matrix of regression coefficients*.

What is then the conditional expectation of *x*<sup>1</sup> given *x*<sup>2</sup> in the bivariate normal case? From formula (3.3.5) for *p* = 2, we have

$$E[X\_1|X\_2] = \mu\_{(1)} + \Sigma\_{12}\Sigma\_{22}^{-1}(X\_2 - \mu\_{(2)}) = \mu\_1 + \frac{\sigma\_{12}}{\sigma\_2^2}(\mathbf{x}\_2 - \mu\_2)$$

$$= \mu\_1 + \frac{\sigma\_1\sigma\_2\rho}{\sigma\_2^2}(\mathbf{x}\_2 - \mu\_2) = \mu\_1 + \frac{\sigma\_1}{\sigma\_2}\rho(\mathbf{x}\_2 - \mu\_2) = E[\mathbf{x}\_1|\mathbf{x}\_2],\qquad(3.3.8)$$

which is linear in *x*2. The coefficient *<sup>σ</sup>*<sup>1</sup> *σ*2 *ρ* is often referred to as *the regression coefficient*. Then, from (3.3.7) we have

$$E[\mathbf{x}\_2|\mathbf{x}\_1] = \mu\_2 + \frac{\sigma\_2}{\sigma\_1} \rho(\mathbf{x}\_1 - \mu\_1), \text{ which is linear in } \mathbf{x}\_1 \tag{3.3.9}$$

and *<sup>σ</sup>*<sup>2</sup> *σ*1 *ρ* is *the regression coefficient*. Thus, (3.3.8) gives the best predictor of *x*<sup>1</sup> based on *x*<sup>2</sup> and (3.3.9), the best predictor of *x*<sup>2</sup> based on *x*1, both being linear in the case of a multivariate real normal distribution; in this case, we have a bivariate normal distribution.

**Example 3.3.1.** Let *X, x*1*, x*2*, x*3*, E*[*X*] = *μ,* Cov*(X)* = *Σ* be specified as follows where *X* ∼ *N*3*(μ, Σ), Σ > O*:

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \; \mu = \begin{bmatrix} \mu\_1 \\ \mu\_2 \\ \mu\_3 \end{bmatrix} = \begin{bmatrix} -1 \\ 0 \\ -2 \end{bmatrix}, \; \Sigma = \begin{bmatrix} 3 & -2 & 0 \\ -2 & 2 & 1 \\ 0 & 1 & 3 \end{bmatrix}.$$

Compute (1) the marginal densities of *x*<sup>1</sup> and *X*<sup>2</sup> = *x*2 *x*3 ; (2) the conditional density of *x*<sup>1</sup> given *X*<sup>2</sup> and the conditional density of *X*<sup>2</sup> given *x*1; (3) conditional expectations or regressions of *x*<sup>1</sup> on *X*<sup>2</sup> and *X*<sup>2</sup> on *x*1.

**Solution 3.3.1.** Let us partition *Σ* accordingly, that is,

$$
\Sigma = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \ \Sigma\_{11} = \sigma\_{11} = (3), \ \Sigma\_{12} = [-2, 0], \ \Sigma\_{21} = \begin{bmatrix} -2 \\ 0 \end{bmatrix}, \ \Sigma\_{22} = \begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix}.
$$

Let us compute the following quantities:

$$\begin{aligned} \Sigma\_{22}^{-1} &= \frac{1}{5} \begin{bmatrix} 3 & -1 \\ -1 & 2 \end{bmatrix} \\ \Sigma\_{11} - \Sigma\_{12} \Sigma\_{22}^{-1} \Sigma\_{21} &= 3 - [-2, 0] \begin{pmatrix} 1 \\ \frac{1}{5} \end{pmatrix} \begin{bmatrix} 3 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} -2 \\ 0 \end{bmatrix} = \frac{3}{5} \\ \Sigma\_{22} - \Sigma\_{21} \Sigma\_{11}^{-1} \Sigma\_{12} &= \begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix} - \begin{pmatrix} 1 \\ 3 \end{pmatrix} \begin{bmatrix} -2 \\ 0 \end{bmatrix} \begin{bmatrix} -2, 0 \end{bmatrix} = \begin{bmatrix} \frac{2}{3} & 1 \\ 1 & 3 \end{bmatrix} \\ \left[ \Sigma\_{11} - \Sigma\_{12} \Sigma\_{22}^{-1} \Sigma\_{21} \right]^{-1} &= \frac{5}{3} \\ \left[ \Sigma\_{22} - \Sigma\_{21} \Sigma\_{11}^{-1} \Sigma\_{12} \right]^{-1} &= \begin{bmatrix} 3 & -1 \\ -1 & \frac{2}{3} \end{bmatrix} .\end{aligned}$$

As well,

$$\begin{aligned} \Sigma\_{12}\Sigma\_{22}^{-1} &= [-2,0] \begin{pmatrix} 1 \\ 5 \end{pmatrix} \begin{bmatrix} 3 & -1 \\ -1 & 2 \end{bmatrix} = \begin{bmatrix} -\frac{6}{5}, & \frac{2}{5} \\ \end{bmatrix} \text{and } \Sigma\_{21}\Sigma\_{11}^{-1} = \begin{pmatrix} 1 \\ 3 \end{pmatrix} \begin{bmatrix} -2 \\ 0 \end{bmatrix} = \begin{bmatrix} -2/3 \\ 0 \end{bmatrix}. \end{aligned}$$

Then we have the following:

$$\begin{aligned} E[X\_1|X\_2] &= \mu\_{(1)} + \Sigma\_{12} \Sigma\_{22}^{-1} (X\_2 - \mu\_{(2)}) \\ &= -1 + \left[ -\frac{6}{5}, \frac{2}{5} \right] \begin{bmatrix} x\_2 - 0 \\ x\_3 + 2 \end{bmatrix} = -1 - \frac{6}{5} x\_2 + \frac{2}{5} (x\_3 + 2) \end{aligned} \tag{i}$$

and

$$\text{Cov}(X\_1|X\_2) = \Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21} = \frac{3}{5};\tag{ii}$$

$$\begin{aligned} E[X\_2|X\_1] &= \mu\_{(2)} + \Sigma\_{12} \Sigma\_{22}^{-1} (X\_1 - \mu\_{(1)}) \\ &= \begin{bmatrix} 0 \\ -2 \end{bmatrix} + \begin{bmatrix} -2/3 \\ 0 \end{bmatrix} (\mathbf{x}\_1 + 1) = \begin{bmatrix} -\frac{2}{3}(\mathbf{x}\_1 + 1) \\ -2 \end{bmatrix} \end{aligned} \tag{iiii}$$

and

$$\text{Cov}(X\_2|X\_1) = \Sigma\_{22} - \Sigma\_{21}\Sigma\_{11}^{-1}\Sigma\_{12} = \begin{bmatrix} 2/3 & 1\\ 1 & 3 \end{bmatrix}.\tag{i\nu}$$

The distributions of *x*<sup>1</sup> and *X*<sup>2</sup> are respectively *x*<sup>1</sup> ∼ *N*1*(*−1*,* 3*)* and *X*<sup>2</sup> ∼ *N*2*(μ(*2*), Σ*22*)*, the corresponding densities denoted by *f*1*(x*1*)* and *f*2*(X*2*)* being

$$\begin{aligned} f\_1(\mathbf{x}\_1) &= \frac{1}{\sqrt{(2\pi)}\sqrt{3}} \mathbf{e}^{-\frac{1}{6}(\mathbf{x}\_1+1)^2}, \ -\infty < \mathbf{x}\_1 < \infty, \\\ f\_2(X\_2) &= \frac{1}{(2\pi)\sqrt{5}} \mathbf{e}^{-\frac{1}{2}\mathcal{Q}\_1}, \ \mathcal{Q}\_1 = \frac{1}{5} [3(\mathbf{x}\_2)^2 - 2(\mathbf{x}\_2)(\mathbf{x}\_3+2) + 2(\mathbf{x}\_3+2)^2] \end{aligned}$$

for −∞ *< xj <* ∞*, j* = 2*,* 3. The conditional distributions are *X*1|*X*<sup>2</sup> ∼ *N*1*(E(X*1|*X*2*),* Var*(X*1|*X*2*))* and *X*2|*X*<sup>1</sup> ∼ *N*2*(E(X*2|*X*1*),*Cov*(X*2|*X*1*))*, the associated densities denoted by *g*1*(X*1|*X*2*)* and *g*2*(X*2|*X*1*)* being given by

$$\begin{aligned} g\_1(X\_1|X\_2) &= \frac{1}{\sqrt{(2\pi)}(3/5)^{\frac{1}{2}}} \mathrm{e}^{-\frac{5}{6}\left[\mathbf{x}\_1+1+\frac{6}{5}\mathbf{x}\_2-\frac{2}{5}(\mathbf{x}\_3+2)\right]^2}, \\ g\_2(X\_2|X\_1) &= \frac{1}{2\pi \times 1} \mathrm{e}^{-\frac{1}{2}\mathcal{Q}\_2}, \\ \mathcal{Q}\_2 &= 3\left[\mathbf{x}\_2+\frac{2}{3}(\mathbf{x}\_1+1)\right]^2 - 2\left[\mathbf{x}\_2+\frac{2}{3}(\mathbf{x}\_1+1)\right](\mathbf{x}\_3+2) + \frac{2}{3}(\mathbf{x}\_3+2)^2 \end{aligned}$$

for −∞ *< xj <* ∞*, j* = 1*,* 2*,* 3. This completes the computations.

#### **3.3a. Conditional and Marginal Densities in the Complex Case**

Let the *p* × 1 complex vector *X*˜ have the *p*-variate complex normal distribution, *X*˜ ∼ *N*˜*p(μ,*˜ *Σ),* ˜ *Σ>O* ˜ . As can be seen from the corresponding mgf which was derived in Sect. 3.2a, all subsets of the variables *x*˜1*,..., x*˜*<sup>p</sup>* are again complex Gaussian distributed. This result can be obtained by integrating out the remaining variables from the *p*-variate complex Gaussian density. Let *U*˜ = *X*˜ − ˜*μ* for convenience. Partition *X,* ˜ *μ,*˜ *U*˜ into subvectors and *Σ* into submatrices as follows:

$$\boldsymbol{\Sigma}^{-1} = \begin{bmatrix} \boldsymbol{\Sigma}^{11} & \boldsymbol{\Sigma}^{12} \\ \boldsymbol{\Sigma}^{21} & \boldsymbol{\Sigma}^{22} \end{bmatrix}, \ \boldsymbol{\tilde{\mu}} = \begin{bmatrix} \boldsymbol{\tilde{\mu}}\_{(1)} \\ \boldsymbol{\tilde{\mu}}\_{(2)} \end{bmatrix}, \ \boldsymbol{\tilde{X}} = \begin{bmatrix} \boldsymbol{\tilde{X}}\_{1} \\ \boldsymbol{\tilde{X}}\_{2} \end{bmatrix}, \ \boldsymbol{\tilde{U}} = \begin{bmatrix} \boldsymbol{\tilde{U}}\_{1} \\ \boldsymbol{\tilde{U}}\_{2} \end{bmatrix}.$$

where *<sup>X</sup>*˜ <sup>1</sup>*, <sup>μ</sup>*˜ *(*1*), <sup>U</sup>*˜<sup>1</sup> are *<sup>r</sup>* <sup>×</sup> 1 and *<sup>Σ</sup>*<sup>11</sup> is *<sup>r</sup>* <sup>×</sup> *<sup>r</sup>*. Consider

$$\begin{split} \tilde{U}^\* \Sigma^{-1} \tilde{U} &= [\tilde{U}\_1^\*, \tilde{U}\_2^\*] \begin{bmatrix} \Sigma^{11} & \Sigma^{12} \\ \Sigma^{21} & \Sigma^{22} \end{bmatrix} \begin{bmatrix} \tilde{U}\_1 \\ \tilde{U}\_2 \end{bmatrix} \\ &= \tilde{U}\_1^\* \Sigma^{11} \tilde{U}\_1 + \tilde{U}\_2^\* \Sigma^{22} \tilde{U}\_2 + \tilde{U}\_1^\* \Sigma^{12} \tilde{U}\_2 + \tilde{U}\_2^\* \Sigma^{21} \tilde{U}\_1 \end{split} \tag{i}$$

and suppose that we wish to integrate out *U*˜<sup>2</sup> to obtain the marginal density of *U*˜1. The terms containing *U*˜<sup>2</sup> are *U*˜ <sup>∗</sup> <sup>2</sup> *<sup>Σ</sup>*22*U*˜<sup>2</sup> <sup>+</sup>*U*˜ <sup>∗</sup> <sup>1</sup> *<sup>Σ</sup>*12*U*˜<sup>2</sup> <sup>+</sup>*U*˜ <sup>∗</sup> <sup>2</sup> *<sup>Σ</sup>*21*U*˜1. On expanding the Hermitian form

$$\begin{split} (\tilde{U}\_2 + C)^\* \Sigma^{22} (\tilde{U}\_2 + C) &= \tilde{U}\_2^\* \Sigma^{22} \tilde{U}\_2 + \tilde{U}\_2^\* \Sigma^{22} C \\ &+ C^\* \Sigma^{22} \tilde{U}\_2 + C^\* \Sigma^{22} C, \end{split} \tag{ii}$$

for some *<sup>C</sup>* and comparing *(i)* and *(ii)*, we may let *<sup>Σ</sup>*21*U*˜<sup>1</sup> <sup>=</sup> *<sup>Σ</sup>*22*<sup>C</sup>* <sup>⇒</sup> *<sup>C</sup>* <sup>=</sup> *(Σ*22*)*−1*Σ*21*U*˜1. Then *<sup>C</sup>*∗*Σ*22*<sup>C</sup>* <sup>=</sup> *<sup>U</sup>*˜ <sup>∗</sup> <sup>1</sup> *<sup>Σ</sup>*12*(Σ*22*)*−1*Σ*21*U*˜<sup>1</sup> and *(i)* may thus be written as

$$(\tilde{U}\_1^\*(\Sigma^{11} - \Sigma^{12}(\Sigma^{22})^{-1}\Sigma^{21})\tilde{U}\_1 + (\tilde{U}\_2 + C)^\*\Sigma^{22}(\tilde{U}\_2 + C),\ C = (\Sigma^{22})^{-1}\Sigma^{21}\tilde{U}\_1.$$

However, from Sect. 1.3 on partitioned matrices, we have

$$
\Sigma^{11} - \Sigma^{12} (\Sigma^{22})^{-1} \Sigma^{21} = \Sigma\_{11}^{-1}.
$$

As well,

$$\begin{split} \det(\boldsymbol{\Sigma}) &= [\det(\boldsymbol{\Sigma}\_{11})] [\det(\boldsymbol{\Sigma}\_{22} - \boldsymbol{\Sigma}\_{21} \boldsymbol{\Sigma}\_{11}^{-1} \boldsymbol{\Sigma}\_{12})] \\ &= [\det(\boldsymbol{\Sigma}\_{11})] [\det((\boldsymbol{\Sigma}^{22})^{-1})] .\end{split}$$

Note that the integral of exp{−*(U*˜2+*C)*∗*Σ*22*(U*˜2+*C)*} over *<sup>U</sup>*˜<sup>2</sup> gives *<sup>π</sup>p*−*r*|det*(Σ*22*)*−1| = *<sup>π</sup>p*−*r*|det*(Σ*<sup>22</sup> <sup>−</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> <sup>11</sup> *Σ*12*)*|. Hence the marginal density of *X*˜ <sup>1</sup> is

$$\tilde{f}\_{1}(\tilde{X}\_{1}) = \frac{1}{\pi^{r} |\det(\Sigma\_{11})|} \mathbf{e}^{-(\tilde{X}\_{1} - \tilde{\mu}\_{(1)})^{s} \Sigma\_{11}^{-1} (\tilde{X}\_{1} - \tilde{\mu}\_{(1)})}, \ \Sigma\_{11} > O. \tag{3.3a.1}$$

It is an *r*-variate complex Gaussian density. Similarly *X*˜ <sup>2</sup> has the *(p* − *r)*-variate complex Gaussian density

$$\tilde{f}\_2(\tilde{X}\_2) = \frac{1}{\pi^{p-r} |\det(\Sigma\_{22})|} \mathbf{e}^{-(\tilde{X}\_2 - \tilde{\mu}\_{(2)})^\* \Sigma\_{22}^{-1} (\tilde{X}\_2 - \tilde{\mu}\_{(2)})}, \ \Sigma\_{22} > O. \tag{3.3a.2}$$

Hence, the conditional density of *X*˜ <sup>1</sup> given *X*˜ 2, is

$$\begin{split} \tilde{g}\_1(\tilde{X}\_1|\tilde{X}\_2) &= \frac{\tilde{f}(\tilde{X}\_1, \tilde{X}\_2)}{\tilde{f}\_2(\tilde{X}\_2)} = \frac{\pi^{p-r} |\det(\Sigma\_{22})|}{\pi^p |\det(\Sigma)|} \\ &\times e^{-(\tilde{X}-\tilde{\mu})^\* \Sigma^{-1} (\tilde{X}-\tilde{\mu}) + (\tilde{X}\_2-\tilde{\mu}\_{(2)})^\* \Sigma^{-1}\_{22} (\tilde{X}\_2-\tilde{\mu}\_{(2)})}. \end{split}$$

From Sect. 1.3, we have

$$|\det(\Sigma)| = |\det(\Sigma\_{22})| \left| \det(\Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}) \right|.$$

and then the normalizing constant is [*πr*|det*(Σ*<sup>11</sup> <sup>−</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*)*|]−1. The exponential part reduces to the following by taking *U*˜ = *X*˜ − ˜*μ, U*˜<sup>1</sup> = *X*˜ <sup>1</sup> − ˜*μ(*1*), U*˜<sup>2</sup> = *X*˜ <sup>2</sup> − ˜*μ(*2*)*:

*(X*˜ − ˜*μ)*∗*Σ*−1*(X*˜ − ˜*μ)* <sup>−</sup> *(X*˜ <sup>2</sup> − ˜*μ(*2*))* <sup>∗</sup>*Σ*−<sup>1</sup> <sup>22</sup> *(X*˜ <sup>2</sup> − ˜*μ(*2*))* = *U*˜ <sup>∗</sup> <sup>1</sup> *<sup>Σ</sup>*11*U*˜<sup>1</sup> <sup>+</sup> *<sup>U</sup>*˜ <sup>∗</sup> <sup>2</sup> *<sup>Σ</sup>*22*U*˜<sup>2</sup> <sup>+</sup> *<sup>U</sup>*˜ <sup>∗</sup> <sup>1</sup> *<sup>Σ</sup>*12*U*˜<sup>2</sup> + *U*˜ <sup>∗</sup> <sup>2</sup> *<sup>Σ</sup>*21*U*˜<sup>1</sup> <sup>−</sup> *<sup>U</sup>*˜ <sup>∗</sup> <sup>2</sup> *(Σ*<sup>22</sup> <sup>−</sup> *<sup>Σ</sup>*21*(Σ*11*)* <sup>−</sup>1*Σ*12*)U*˜<sup>2</sup> = *U*˜ <sup>∗</sup> <sup>1</sup> *<sup>Σ</sup>*11*U*˜<sup>1</sup> <sup>+</sup> *<sup>U</sup>*˜ <sup>∗</sup> <sup>2</sup> *<sup>Σ</sup>*21*(Σ*11*)* <sup>−</sup>1*Σ*12*U*˜<sup>2</sup> <sup>+</sup> <sup>2</sup> *<sup>U</sup>*˜ <sup>∗</sup> <sup>1</sup> *<sup>Σ</sup>*12*U*˜<sup>2</sup> = [*U*˜<sup>1</sup> <sup>+</sup> *(Σ*11*)* <sup>−</sup>1*Σ*12*U*˜2] <sup>∗</sup>*Σ*11[*U*˜<sup>1</sup> <sup>+</sup> *(Σ*11*)* <sup>−</sup>1*Σ*12*U*˜2]*.* (3.3*a*.3)

This exponent has the same structure as that of a complex Gaussian density with *<sup>E</sup>*[*U*˜1|*U*˜2]=−*(Σ*11*)*−1*Σ*12*U*˜<sup>2</sup> and Cov*(X*˜ <sup>1</sup>|*X*˜ <sup>2</sup>*)* <sup>=</sup> *(Σ*11*)*−<sup>1</sup> <sup>=</sup> *<sup>Σ</sup>*<sup>11</sup> <sup>−</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*21*.* Therefore the conditional density of *X*˜ <sup>1</sup> given *X*˜ <sup>2</sup> is given by

$$\tilde{g}\_1(\tilde{X}\_1|\tilde{X}\_2) = \frac{1}{\pi^r |\det(\Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21})|} \mathbf{e}^{-(\tilde{X}\_1 - \tilde{\mu}\_{(1)} + C)^\*\Sigma^{11}(\tilde{X}\_1 - \tilde{\mu}\_{(1)} + C)},$$

$$C = -(\Sigma^{11})^{-1}\Sigma^{12}(\tilde{X}\_2 - \tilde{\mu}\_{(2)}).\qquad(3.3a4)$$

The conditional expectation of *X*˜ <sup>1</sup> given *X*˜ <sup>2</sup> is then

$$\begin{split} E[\tilde{X}\_1|\tilde{X}\_2] &= \tilde{\mu}\_{\text{(l)}} - (\Sigma^{11})^{-1} \Sigma^{12} (\tilde{X}\_2 - \tilde{\mu}\_{\text{(2)}}) \\ &= \tilde{\mu}\_{\text{(l)}} + \Sigma\_{12} \Sigma\_{22}^{-1} (\tilde{X}\_2 - \tilde{\mu}\_{\text{(2)}}) \text{ (linear in } \tilde{X}\_2) \end{split} \tag{3.3a.5}$$

which follows from a result on partitioning of matrices obtained in Sect. 1.3. The matrix *Σ*12*Σ*−<sup>1</sup> <sup>22</sup> is referred to as *the matrix of regression coefficients*. The conditional covariance matrix is

$$\text{Cov}(\tilde{X}\_1|\tilde{X}\_2) = \Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21} \text{ (free of } \tilde{X}\_2).$$

From symmetry, the conditional density of *X*˜ <sup>2</sup> given *X*˜ <sup>1</sup> is given by

$$\begin{split} \tilde{g}\_2(\tilde{X}\_2|\tilde{X}\_1) &= \frac{1}{\pi^{p-r} |\det(\Sigma\_{22} - \Sigma\_{21}\Sigma\_{11}^{-1}\Sigma\_{12})|} \\ &\times e^{-(\tilde{X}\_2 - \tilde{\mu}\_{(2)} + C\_1)^{\*}\Sigma^{22}(\tilde{X}\_2 - \tilde{\mu}\_{(2)} + C\_1)}, \\ C\_1 &= -(\Sigma^{22})^{-1}\Sigma^{21}(\tilde{X}\_1 - \tilde{\mu}\_{(1)}), \ \Sigma^{22} > O. \end{split} \tag{3.3a.6}$$

Then the conditional expectation and the conditional covariance of *X*˜ <sup>2</sup> given *X*˜ <sup>1</sup> are the following:

$$\begin{split} E[\tilde{X}\_2|\tilde{X}\_1] &= \tilde{\mu}\_{(2)} - (\Sigma^{22})^{-1} \Sigma^{21} (\tilde{X}\_1 - \tilde{\mu}\_{(1)}) \\ &= \tilde{\mu}\_{(2)} + \Sigma\_{21} \Sigma\_{11}^{-1} (\tilde{X}\_1 - \tilde{\mu}\_{(1)}) \text{ (linear in } \tilde{X}\_1) \\ \text{Cov}(\tilde{X}\_2|\tilde{X}\_1) &= (\Sigma^{22})^{-1} = \Sigma\_{22} - \Sigma\_{21} \Sigma\_{11}^{-1} \Sigma\_{12} \text{ (free of } \tilde{X}\_1), \end{split} \tag{3.3a.7}$$

where, in this case, the matrix *Σ*21*Σ*−<sup>1</sup> <sup>11</sup> is referred to as *the matrix of regression coefficients*.

**Example 3.3a.1.** Let *X,* ˜ *μ*˜ = *E*[*X*˜ ]*, Σ* = Cov*(X)*˜ be as follows:

$$
\tilde{X} = \begin{bmatrix} \tilde{x}\_1 \\ \tilde{x}\_2 \\ \tilde{x}\_3 \end{bmatrix}, \tilde{\mu} = \begin{bmatrix} 1+i \\ 2-i \\ 3i \end{bmatrix}, \Sigma = \begin{bmatrix} 3 & 1+i & 0 \\ 1-i & 2 & i \\ 0 & -i & 3 \end{bmatrix}.
$$

Consider the partitioning

$$\begin{aligned} \Sigma &= \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \Sigma\_{11} = \begin{bmatrix} 3 & 1+i \\ 1-i & 2 \end{bmatrix}, \ \Sigma\_{12} = \begin{bmatrix} 0 \\ i \end{bmatrix}, \ \tilde{X} = \begin{bmatrix} \bar{X}\_1 \\ \tilde{X}\_2 \end{bmatrix}, \ \tilde{X}\_1 = \begin{bmatrix} \tilde{x}\_1 \\ \tilde{x}\_2 \end{bmatrix}, \\\ \Sigma\_{21} = [0, -i], \ \Sigma\_{22} = (3), \ \tilde{X}\_2 = (\tilde{x}\_3), \end{aligned}$$

where *x*˜*<sup>j</sup> , j* = 1*,* 2*,* 3 are scalar complex variables and *X*˜ ∼ *N*˜3*(μ, Σ)* ˜ . Determine (1) the marginal densities of *X*˜ <sup>1</sup> and *X*˜ 2; (2) the conditional expectation of *X*˜ <sup>1</sup>|*X*˜ <sup>2</sup> or *E*[*X*˜ <sup>1</sup>|*X*˜ <sup>2</sup>] and the conditional expectation of *X*˜ <sup>2</sup>|*X*˜ <sup>1</sup> or *E*[*X*˜ <sup>2</sup>|*X*˜ <sup>1</sup>]; (3) the conditional densities of *X*˜ <sup>1</sup>|*X*˜ <sup>2</sup> and *X*˜ <sup>2</sup>|*X*˜ 1.

**Solution 3.3a.1.** Note that *Σ* = *Σ*<sup>∗</sup> and hence *Σ* is Hermitian. Let us compute the leading minors of *<sup>Σ</sup>*: det*((*3*))* <sup>=</sup> <sup>3</sup> *<sup>&</sup>gt;* <sup>0</sup>*,* det 3 1 + *i* 1 − *i* 2 = 6 − *(*1 + 1*)* = 4 *>* 0, det*(Σ)* = 3 det 2 *i* −*i* 3 <sup>−</sup> *(*<sup>1</sup> <sup>+</sup> *i)* det 1 − *i i* 0 3 <sup>+</sup> <sup>0</sup> = *(*3*)(*5*)* − 3*(*1 + 1*)* = 9 *>* 0*.*

Hence *Σ* is Hermitian positive definite. Note that the cofactor expansion for determinants holds whether the elements present in the determinant are real or complex. Let us compute the inverses of the submatrices by taking the transpose of the matrix of cofactors divided by the determinant. This formula applies whether the elements comprising the matrix are real or complex. Then

$$
\Sigma\_{22}^{-1} = \frac{1}{3}, \ \Sigma\_{11}^{-1} = \begin{bmatrix} 3 & 1+i \\ 1-i & 2 \end{bmatrix}^{-1} = \frac{1}{4} \begin{bmatrix} 2 & -(1+i) \\ -(1-i) & 3 \end{bmatrix}; \tag{i}
$$

$$\begin{aligned} \left[\Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21} = \begin{bmatrix} 3 & 1+i \\ 1-i & 2 \end{bmatrix} - \begin{bmatrix} 0 \\ i \end{bmatrix}\right] \begin{pmatrix} 1 \\ 3 \end{pmatrix} \begin{bmatrix} 0, -i \end{bmatrix} \\ &= \begin{bmatrix} 3 & 1+i \\ 1-i & 2 \end{bmatrix} - \frac{1}{3} \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 3 & 1+i \\ 1-i & \frac{5}{3} \end{bmatrix} \Rightarrow \quad (ii) \end{aligned}$$

$$[\left(\Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}\right)^{-1} = \begin{bmatrix} 3+1+i & \\ 1-i & \frac{5}{3} \end{bmatrix}^{-1} = \frac{1}{3} \begin{bmatrix} \frac{5}{3} & -(1+i) \\ -(1-i) & 3 \end{bmatrix};\qquad(iii)$$

$$\begin{aligned} \Sigma\_{22} - \Sigma\_{21} \Sigma\_{11}^{-1} \Sigma\_{12} &= 3 - [0, -i] \left( \frac{1}{4} \begin{bmatrix} 2 & -(1+i) \\ -(1-i) & 3 \end{bmatrix} \right) \begin{bmatrix} 0 \\ i \end{bmatrix} \\ &= 3 - \frac{3}{4} = \frac{9}{4} \Rightarrow & \text{(iv)} \end{aligned}$$

$$\left[\left[\Sigma\_{22} - \Sigma\_{21}\Sigma\_{11}^{-1}\Sigma\_{12}\right]^{-1} = \frac{4}{9}.\tag{\text{v}}$$

As well,

$$
\Sigma\_{12}\Sigma\_{22}^{-1} = \frac{1}{3} \begin{bmatrix} 0 \\ i \end{bmatrix} = \begin{bmatrix} 0 \\ \frac{i}{3} \end{bmatrix}; \tag{\text{v}\,\text{i}}
$$

$$
\Sigma\_{21} \Sigma\_{11}^{-1} = [0, -i] \begin{pmatrix} 1 \\ 4 \end{pmatrix} \begin{bmatrix} 2 & -(1+i) \\ -(1-i) & 3 \end{bmatrix} \begin{bmatrix} = \frac{1}{4}[1+i, -3i] \end{bmatrix} . \tag{vii}$$

With these computations, all the questions can be answered. We have

$$\tilde{X}\_{\mathbf{l}} \sim \tilde{N}\_2(\tilde{\mu}\_{\text{(l)}}, \Sigma\_{\text{l1)}}), \ \tilde{\mu}\_{\text{(l)}} = \begin{bmatrix} 1+i \\ 2-i \end{bmatrix}, \ \Sigma\_{\text{l1}} = \begin{bmatrix} 3 & 1+i \\ 1-i & 2 \end{bmatrix}$$

and *X*˜ <sup>2</sup> = ˜*x*<sup>3</sup> ∼ *N*˜1*(*3*i,* 3*)*. Let the densities of *X*˜ <sup>1</sup> and *X*˜ <sup>2</sup> = ˜*x*<sup>3</sup> be denoted by *f*˜ <sup>1</sup>*(X*˜ <sup>1</sup>*)* and *f*˜ <sup>2</sup>*(x*˜3*)*, respectively. Then

$$\begin{split} \tilde{f}\_{2}(\tilde{\mathbf{x}}\_{3}) &= \frac{1}{(\pi)(3)} \mathbf{e}^{-\frac{1}{3}(\tilde{\mathbf{x}}\_{3} - 3\mathbf{i})^{\*}(\tilde{\mathbf{x}}\_{3} - 3\mathbf{i})}, \\ \tilde{f}\_{1}(\tilde{\mathbf{x}}\_{1}) &= \frac{1}{(\pi^{2})(4)} \mathbf{e}^{-\mathcal{Q}\_{1}}, \\ \mathcal{Q}\_{1} &= \frac{1}{4} [2(\tilde{\mathbf{x}}\_{1} - (1 + i))^{\*}(\tilde{\mathbf{x}}\_{1} - (1 + i)) \\ &- (1 + i)(\tilde{\mathbf{x}}\_{1} - (1 + i))^{\*}(\tilde{\mathbf{x}}\_{2} - (2 - i)) - (1 - i)(\tilde{\mathbf{x}}\_{2} \\ &- (2 - i))^{\*}(\tilde{\mathbf{x}}\_{1} - (1 + i)) + 3(\tilde{\mathbf{x}}\_{2} - (2 - i))^{\*}(\tilde{\mathbf{x}}\_{2} - (2 - i)). \end{split}$$

The conditional densities, denoted by *g*˜1*(X*˜ <sup>1</sup>|*X*˜ <sup>2</sup>*)* and *g*˜2*(X*˜ <sup>2</sup>|*X*˜ <sup>1</sup>*)*, are the following:

$$\begin{aligned} \tilde{g}\_1(\tilde{X}\_1|\tilde{X}\_2) &= \frac{1}{(\pi^2)(3)} \mathbf{e}^{-\mathcal{Q}\_2}, \\ \mathcal{Q}\_2 &= \frac{1}{3} \Big[ \frac{5}{3} (\tilde{\mathbf{x}}\_1 - (1+i))^\* (\tilde{\mathbf{x}}\_1 - (1+i)) \\ &- (1+i)(\tilde{\mathbf{x}}\_1 - (1+i))^\* (\tilde{\mathbf{x}}\_2 - (2-i) - \frac{i}{3} (\tilde{\mathbf{x}}\_3 - 3i)) \\ &- (1-i)(\tilde{\mathbf{x}}\_2 - (2-i) - \frac{i}{3} (\tilde{\mathbf{x}}\_3 - 3i))^\* (\tilde{\mathbf{x}}\_1 - (1+i)) \\ &+ 3(\tilde{\mathbf{x}}\_2 - (2-i) - \frac{i}{3} (\tilde{\mathbf{x}}\_3 - 3i))^\* (\tilde{\mathbf{x}}\_2 - (2-i) - \frac{i}{3} (\tilde{\mathbf{x}}\_3 - 3i)) \Big]; \end{aligned}$$

The Multivariate Gaussian and Related Distributions

$$\begin{split} \tilde{g}\_2(\tilde{X}\_2|\tilde{X}\_1) &= \frac{1}{(\pi)(9/4)} \mathbf{e}^{-\mathcal{Q}\_3}, \\ \mathcal{Q}\_3 &= \frac{4}{9} [(\tilde{x}\_3 - M\_3)^\*(\tilde{x}\_3 - M\_3) \text{ where} \\ M\_3 &= 3i + \frac{1}{4} \{(1+i)[\tilde{x}\_1 - (1+i)] - 3i[\tilde{x}\_2 - (2-i)]\}, \\ &= 3i + \frac{1}{4} \{(1+i)\tilde{x}\_1 - 3i\tilde{x}\_2 + 3 + 4i\}. \end{split}$$

#### **The bivariate complex Gaussian case**

Letting *ρ* denote the correlation between *x*˜<sup>1</sup> and *x*˜2, it is seen from (3.3*a*.5) that for *p* = 2,

$$\begin{split} E[\tilde{X}\_1|\tilde{X}\_2] &= \tilde{\mu}\_{(1)} + \Sigma\_{12} \Sigma\_{22}^{-1} (\tilde{X}\_2 - \tilde{\mu}\_{(2)}) = \tilde{\mu}\_1 + \frac{\sigma\_{12}}{\sigma\_2^2} (\tilde{x}\_2 - \tilde{\mu}\_2) \\ &= \tilde{\mu}\_1 + \frac{\sigma\_1 \sigma\_2 \rho}{\sigma\_2^2} (\tilde{x}\_2 - \tilde{\mu}\_2) = \tilde{\mu}\_1 + \frac{\sigma\_1}{\sigma\_2} \rho \left( \tilde{x}\_2 - \tilde{\mu}\_2 \right) = E[\tilde{x}\_1|\tilde{x}\_2] \quad (\text{linear in } \tilde{x}\_2). \end{split} \tag{3.3a.8}$$

Similarly,

$$E[\tilde{\mathbf{x}}\_2|\tilde{\mathbf{x}}\_1] = \tilde{\mu}\_2 + \frac{\sigma\_2}{\sigma\_1} \rho \left(\tilde{\mathbf{x}}\_1 - \tilde{\mu}\_1\right) \text{ (linear in } \tilde{\mathbf{x}}\_1\text{)}.\tag{3.3a.9}$$

Incidentally, *σ*12*/σ*<sup>2</sup> <sup>2</sup> and *<sup>σ</sup>*12*/σ*<sup>2</sup> <sup>1</sup> are referred to as *the regression coefficients*.

#### **Exercises 3.3**

**3.3.1.** Let the real *p* × 1 vector *X* have a *p*-variate nonsingular normal density *X* ∼ *Np(μ, Σ), Σ > O*. Let *u* = *X*- *Σ*−1*X*. Make use of the mgf to derive the density of *u* for (1) *μ* = *O*, (2) *μ* = *O*.

**3.3.2.** Repeat Exercise 3.3.1 for *μ* = *O* for the complex nonsingular Gaussian case.

**3.3.3.** Observing that the density coming from Exercise 3.3.1 is a noncentral chi-square density, coming from the real *p*-variate Gaussian, derive the non-central F (the numerator chisquare is noncentral and the denominator chisquare is central) density with *m* and *n* degrees of freedom and the two chisquares are independently distributed.

**3.3.4.** Repeat Exercise 3.3.3 for the complex Gaussian case.

**3.3.5.** Taking the density of *u* in Exercise 3.3.1 as a real noncentral chisquare density, derive the density of a real doubly noncentral F.

**3.3.6.** Repeat Exercise 3.3.5 for the corresponding complex case.

**3.3.7.** Construct a 3 × 3 Hermitian positive definite matrix *V* . Let this be the covariance matrix of a 3 <sup>×</sup> 1 vector variable *<sup>X</sup>*˜ . Compute *<sup>V</sup>* <sup>−</sup>1. Then construct a Gaussian density for this *X*˜ . Derive the marginal joint densities of (1) *x*˜<sup>1</sup> and *x*˜2, (2) *x*˜<sup>1</sup> and *x*˜3, (3) *x*˜<sup>2</sup> and *x*˜3, where *x*˜1*, x*˜2*, x*˜<sup>3</sup> are the components of *X*˜ . Take *E*[*X*˜ ] = *O*.

**3.3.8.** In Exercise 3.3.7, compute (1) *E*[ ˜*x*1| ˜*x*2], (2) the conditional joint density of *x*˜1*, x*˜2, given *x*˜3. Take *E*[*X*˜ ]= ˜*μ* = *O*.

**3.3.9.** In Exercise 3.3.8, compute the mgf in the conditional space of *x*˜<sup>1</sup> given *x*˜2*, x*˜3, that is, *<sup>E</sup>*[e*(t*1*x*˜1*)* | ˜*x*2*, x*˜3].

**3.3.10.** In Exercise 3.3.9, compute the mgf in the marginal space of *x*˜2*, x*˜3. What is the connection of the results obtained in Exercises 3.3.9 and 3.3.10 with the mgf of *X*˜ ?

#### **3.4. Chisquaredness and Independence of Quadratic Forms in the Real Case**

Let the *p* × 1 vector *X* have a *p*-variate real Gaussian density with a null vector as its mean value and the identity matrix as its covariance matrix, that is, *X* ∼ *Np(O, I )*, that is, the components of *X* are mutually independently distributed real scalar standard normal variables. Let *u* = *X*- *AX, A* = *A* be a real quadratic form in this *X*. The chisquaredness of a quadratic form such as *u* has already been discussed in Chap. 2. In this section, we will start with such a *u* and then consider its generalizations. When *A* = *A*- , there exists an orthonormal matrix *P*, that is, *P P*- = *I, P*- *P* = *I,* such that *P*- *AP* = diag*(λ*1*,...,λp)* where *λ*1*,...,λp* are the eigenvalues of *A*. Letting *Y* = *P*- *X*, *E*[*Y* ] = *P*- *O* = *O* and Cov*(Y )* = *P*- *IP* = *I* . But *Y* is a linear function of *X* and hence, *Y* is also real Gaussian distributed; thus, *<sup>Y</sup>* <sup>∼</sup> *Np(O, I )*. Then, *<sup>y</sup>*<sup>2</sup> *j iid* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> *, j* <sup>=</sup> <sup>1</sup>*,... , p,* or the *<sup>y</sup>*<sup>2</sup> *<sup>j</sup>* 's are independently distributed chisquares, each having one degree of freedom. Note that

$$
\mu = X^\prime AX = Y^\prime P^\prime APY = \lambda\_1 \mathbf{y}\_1^2 + \cdots + \lambda\_p \mathbf{y}\_p^2. \tag{3.4.1}
$$

We have the following result on the chisquaredness of quadratic forms in the real *p*-variate Gaussian case, which corresponds to Theorem 2.2.1.

**Theorem 3.4.1.** *Let the p* × 1 *vector be real Gaussian with the parameters μ* = *O and Σ* = *I or X* ∼ *Np(O, I ). Let u* = *X*- *AX, A* = *A be a quadratic form in this X. Then u* = *X*- *AX* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>r</sup> , that is, a real chisquare with <sup>r</sup> degrees of freedom, if and only if <sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> *and the rank of A is r.*

*Proof***:** When *A* = *A*- , we have the representation of the quadratic form given in (3.4.1). When *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*2, all the eigenvalues of *<sup>A</sup>* are 1's and 0's. Then *<sup>r</sup>* of the *λj* 's are unities and the remaining ones are zeros and then (3.4.1) becomes the sum of *r* independently distributed real chisquares having one degree of freedom each, and hence the sum is a real chisquare with *r* degrees of freedom. For proving the second part, we will assume that *u* = *X*- *AX* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>r</sup>* . Then the mgf of *<sup>u</sup>* is *Mu(t)* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t)*−*<sup>r</sup>* <sup>2</sup> for 1 − 2*t >* 0. The representation in (3.4.1) holds in general. The mgf of *y*<sup>2</sup> *<sup>j</sup> , λjy*<sup>2</sup> *<sup>j</sup>* and the sum of *λjy*<sup>2</sup> *<sup>j</sup>* are the following:

$$M\_{\mathbf{y}\_j^2}(t) = (1 - 2t)^{-\frac{1}{2}},\ M\_{\lambda\_j \mathbf{y}\_j^2}(t) = (1 - 2\lambda\_j t)^{-\frac{1}{2}},\ M\_u(t) = \prod\_{j=1}^p (1 - 2\lambda\_j t)^{-\frac{1}{2}}$$

for 1 − *λj t >* 0*, j* = 1*,...,p*. Hence, we have the following identity:

$$(1 - 2t)^{-\frac{r}{2}} = \prod\_{j=1}^{p} (1 - 2\lambda\_j t)^{-\frac{1}{2}}, \\ 1 - 2t > 0, \\ 1 - 2\lambda\_j t > 0, \ j = 1, \ldots, p. \tag{3.4.2}$$

Taking natural logarithm on both sides of (3.4.2), expanding and then comparing the coefficients of 2*t, (*2*t)*<sup>2</sup> <sup>2</sup> *,...,* we have

$$r = \sum\_{j=1}^{p} \lambda\_j = \sum\_{j=1}^{p} \lambda\_j^2 = \sum\_{j=1}^{p} \lambda\_j^3 = \dotsb \tag{3.4.3}$$

The only solution (3.4.3) can have is that *r* of the *λj* 's are unities and the remaining ones zeros. This property alone will not guarantee that *A* is idempotent. However, having eigenvalues that are equal to zero or one combined with the property that *A* = *A* will ensure that *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*2. This completes the proof.

Let us look into some generalizations of the Theorem 3.4.1. Let the *p* × 1 vector have a real Gaussian distribution *X* ∼ *Np(O, Σ), Σ > O*, that is, *X* is a Gaussian vector with the null vector as its mean value and a real positive definite matrix as its covariance matrix. When *Σ* is positive definite, we can define *Σ* 1 <sup>2</sup> . Letting *<sup>Z</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>*X*, *Z* will be distributed as a standard Gaussian vector, that is, *Z* ∼ *Np(O, I )*, since *Z* is a linear function of *X* with *E*[*Z*] = *O* and Cov*(Z)* = *I* . Now, Theorem 3.4.1 is applicable to *Z*. Then *u* = *X*- *AX, A* = *A*- *,* becomes

$$u = Z^\prime \Sigma^{\frac{1}{2}} A \Sigma^{\frac{1}{2}} Z, \ \Sigma^{\frac{1}{2}} A \Sigma^{\frac{1}{2}} = (\Sigma^{\frac{1}{2}} A \Sigma^{\frac{1}{2}})',$$

and it follows from Theorem 3.4.1 that the next result holds:

**Theorem 3.4.2.** *Let the p* × 1 *vector X have a real p-variate Gaussian density X* ∼ *Np(O, Σ), Σ > O. Then q* = *X*- *AX, A* = *A*- *, is a real chisquare with r degrees of freedom if and only if Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup> *is idempotent and of rank r or, equivalently, if and only if A* = *AΣA and the rank of A is r.*

Now, let us consider the general case. Let *X* ∼ *Np(μ, Σ), Σ > O*. Let *q* = *X*- *AX, A* = *A*- . Then, referring to representation (2.2.1), we can express *q* as

$$
\lambda\_1 (\boldsymbol{u}\_1 + \boldsymbol{b}\_1)^2 + \dots + \lambda\_p (\boldsymbol{u}\_p + \boldsymbol{b}\_p)^2 \equiv \lambda\_1 \boldsymbol{w}\_1^2 + \dots + \lambda\_p \boldsymbol{w}\_p^2 \tag{3.4.4}
$$

where *U* = *(u*1*,...,up)*- ∼ *Np(O, I )*, the *λj* 's, *j* = 1*,...,p*, are the eigenvalues of *Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup> and *bi* is the *i*-th component of *P*- *Σ*−<sup>1</sup> <sup>2</sup>*μ*, *P* being a *p* × *p* orthonormal matrix whose *j* -th column consists of the normalized eigenvectors corresponding to *λj , j* = <sup>1</sup>*,...,p*. When *<sup>μ</sup>* <sup>=</sup> *<sup>O</sup>*, *<sup>w</sup>*<sup>2</sup> *<sup>j</sup>* is a real central chisquare random variable having one degree of freedom; otherwise, it is a real noncentral chisquare random variable with one degree of freedom and noncentality parameter <sup>1</sup> 2 *b*2 *<sup>j</sup>* . Thus, in general, (3.4.4) is a linear function of independently distributed real noncentral chisquare random variables having one degree of freedom each.

**Example 3.4.1.** Let *X* ∼ *N*3*(O, Σ), q* = *X*- *AX* where

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \Sigma = \frac{1}{3} \begin{bmatrix} 2 & 0 & -1 \\ 0 & 2 & -1 \\ -1 & -1 & 3 \end{bmatrix}, \ A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}.$$

(1) Show that *<sup>q</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> by applying Theorem 3.4.2 as well as independently; (2) If the mean value vector *μ*-= [−1*,* 1*,* −2], what is then the distribution of *q*?

**Solution 3.4.1.** In (1) *μ* = *O* and

$$X^\prime AX = \mathbf{x}\_1^2 + \mathbf{x}\_2^2 + \mathbf{x}\_3^2 + 2(\mathbf{x}\_1\mathbf{x}\_2 + \mathbf{x}\_1\mathbf{x}\_3 + \mathbf{x}\_2\mathbf{x}\_3) = \left(\mathbf{x}\_1 + \mathbf{x}\_2 + \mathbf{x}\_3\right)^2.$$

Let *y*<sup>1</sup> = *x*<sup>1</sup> + *x*<sup>2</sup> + *x*3. Then *E*[*y*1] = 0 and

$$\begin{split} \text{Var}(\mathbf{y}\_1) &= \text{Var}(\mathbf{x}\_1) + \text{Var}(\mathbf{x}\_2) + \text{Var}(\mathbf{x}\_3) + 2[\text{Cov}(\mathbf{x}\_1, \mathbf{x}\_2) + \text{Cov}(\mathbf{x}\_1, \mathbf{x}\_3) + \text{Cov}(\mathbf{x}\_2, \mathbf{x}\_3)] \\ &= \frac{1}{3}[2 + 2 + 3 + 0 - 2 - 2] = \frac{3}{3} = 1. \end{split}$$

Hence, *y*<sup>1</sup> = *x*<sup>1</sup> + *x*<sup>2</sup> + *x*<sup>3</sup> has *E*[*u*1] = 0 and Var*(u*1*)* = 1, and since it is a linear function of the real normal vector *<sup>X</sup>*, *<sup>y</sup>*<sup>1</sup> is a standard normal. Accordingly, *<sup>q</sup>* <sup>=</sup> *<sup>y</sup>*<sup>2</sup> 1 ∼ *χ*2 <sup>1</sup> . In order to apply Theorem 3.4.2, consider *AΣA*:

$$A \Sigma A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} \left( \frac{1}{3} \begin{bmatrix} 2 & 0 & -1 \\ 0 & 2 & -1 \\ -1 & -1 & 3 \end{bmatrix} \right) \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} = A \dots$$

Then, by Theorem 3.4.2, *q* = *X*- *AX* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>r</sup>* where *r* is the rank of *A*. In this case, the rank of *<sup>A</sup>* is 1 and hence *<sup>y</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> . This completes the calculations in connection with respect to (1). When *μ* <sup>=</sup> *<sup>O</sup>*, *<sup>u</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> *(λ)*, a noncentral chisquare with noncentrality parameter *<sup>λ</sup>* <sup>=</sup> <sup>1</sup> 2*μ*- *<sup>Σ</sup>*−1*μ*. Let us compute *<sup>Σ</sup>*−<sup>1</sup> by making use of the formula *<sup>Σ</sup>*−<sup>1</sup> <sup>=</sup> <sup>1</sup> |*Σ*| [Cof*(Σ)*] - where Cof*(Σ)* is the matrix of cofactors of *Σ* wherein each of its elements is replaced by its cofactor. Now,

$$
\Sigma^{-1} = \left\{ \frac{1}{3} \begin{bmatrix} 2 & 0 & -1 \\ 0 & 2 & -1 \\ -1 & -1 & 3 \end{bmatrix} \right\}^{-1} = \frac{3}{8} \begin{bmatrix} 5 & 1 & 2 \\ 1 & 5 & 2 \\ 2 & 2 & 4 \end{bmatrix} = \Sigma^{-1} \dots
$$

Then,

$$
\lambda = \frac{1}{2} \mu' \Sigma^{-1} \mu = \frac{3}{16} [-1, 1, -2] \begin{bmatrix} 5 & 1 & 2 \\ 1 & 5 & 2 \\ 2 & 2 & 4 \end{bmatrix} \begin{bmatrix} -1 \\ 1 \\ -2 \end{bmatrix} = \frac{3}{16} \times 24 = \frac{9}{2}.
$$

This completes the computations for the second part.

#### **3.4.1. Independence of quadratic forms**

Another relevant result in the real case pertains to the independence of quadratic forms. The concept of chisquaredness and the independence of quadratic forms are prominently encountered in the theoretical underpinnings of statistical techniques such as the Analysis of Variance, Regression and Model Building when it is assumed that the errors are normally distributed. First, we state a result on the independence of quadratic forms in Gaussian vectors whose components are independently distributed.

**Theorem 3.4.3.** *Let u*<sup>1</sup> = *X*- *AX, A* = *A*- *, and u*<sup>2</sup> = *X*- *BX, B* = *B*- *, be two quadratic forms in X* ∼ *Np(μ, I ). Then u*<sup>1</sup> *and u*<sup>2</sup> *are independently distributed if and only if AB* = *O.*

Note that independence property holds whether *μ* = *O* or *μ* = *O*. The result will still be valid if the covariance matrix is *σ*2*I* where *σ*<sup>2</sup> is a positive real scalar quantity. If the covariance matrix is *Σ>O*, the statement of Theorem 3.4.3 needs modification.

*Proof***:** Since *AB* = *O*, we have *AB* = *O* = *O*- = *(AB)*- = *B*- *A*- = *BA*, which means that *A* and *B* commute. Then there exists an orthonormal matrix *P, PP*- = *I, P*- *P* = *I,* that diagonalizes both *A* and *B*, and

$$AB = O \Rightarrow PABP = O \Rightarrow P'APP'BP = D\_1D\_2 = O,$$

$$D\_1 = \text{diag}(\lambda\_1, \dots, \lambda\_p), \ D\_2 = \text{diag}(\nu\_1, \dots, \nu\_p),\tag{3.4.5}$$

where *λ*1*,...,λp* are the eigenvalues of *A* and *ν*1*,...,νp* are the eigenvalues of *B*. Let *Y* = *P*- *X*, then the canonical representations of *u*<sup>1</sup> and *u*<sup>2</sup> are the following:

$$\mu\_1 = \lambda\_1 \mathbf{y}\_1^2 + \dots + \lambda\_p \mathbf{y}\_p^2 \tag{3.4.6}$$

$$
\mu\_2 = \nu\_1 \mathbf{y}\_1^2 + \dots + \nu\_p \mathbf{y}\_p^2 \tag{3.4.7}
$$

where *yj* 's are real and independently distributed. But *D*1*D*<sup>2</sup> = *O* means that whenever a *λj* = 0 then the corresponding *νj* = 0 and vice versa. In other words, whenever a *yj* is present in (3.4.6), it is absent in (3.4.7) and vice versa, or the independent variables *yj* 's are separated in (3.4.6) and (3.4.7), which implies that *u*<sup>1</sup> and *u*<sup>2</sup> are independently distributed.

The necessity part of the proof which consists in showing that *AB* = *O* given that *A* = *A*- *, B* = *B* and *u*<sup>1</sup> and *u*<sup>2</sup> are independently distributed, cannot be established by retracing the steps utilized for proving the sufficiency as it requires more matrix manipulations. We note that there are several incorrect or incomplete proofs of Theorem 3.4.3 in the statistical literature. A correct proof for the central case is given in Mathai and Provost (1992).

If *<sup>X</sup>* <sup>∼</sup> *Np(μ, Σ), Σ > O*, consider the transformation *<sup>Y</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>*X* ∼ *Np(Σ*−<sup>1</sup> <sup>2</sup>*μ, I )*. Then, *u*<sup>1</sup> = *X*- *AX* = *Y* - *Σ* 1 <sup>2</sup>*AΣ* 1 <sup>2</sup> *Y, u*<sup>2</sup> = *X*- *BX* = *Y* - *Σ* 1 <sup>2</sup>*BΣ* 1 <sup>2</sup> *Y* , and we can apply Theorem 3.4.3. In that case, the matrices being orthogonal means

$$
\Sigma^{\frac{1}{2}} A \Sigma^{\frac{1}{2}} \Sigma^{\frac{1}{2}} B \Sigma^{\frac{1}{2}} = O \Rightarrow A \Sigma B = O.
$$

Thus we have the following result:

**Theorem 3.4.4.** *Let u*<sup>1</sup> = *X*- *AX, A* = *A and u*<sup>2</sup> = *X*- *BX, B* = *B where X* ∼ *Np(μ, Σ), Σ > O. Then u*<sup>1</sup> *and u*<sup>2</sup> *are independently distributed if and only if AΣB* = *O.*

What about the distribution of the quadratic form *y* = *(X* − *μ)*- *<sup>Σ</sup>*−1*(X* <sup>−</sup> *μ)* that is present in the exponent of the *p*-variate real Gaussian density? Let us first determine the mgf of *y*, that is,

$$\begin{split} M\_{\mathbf{y}}(t) &= E[\mathbf{e}^{t\mathbf{y}}] = \frac{1}{(2\pi)^{\frac{p}{2}} |\boldsymbol{\Sigma}|^{\frac{1}{2}}} \int\_{X} \mathbf{e}^{t(X-\mu)'\boldsymbol{\Sigma}^{-1}(X-\mu) - \frac{1}{2}(X-\mu)'\boldsymbol{\Sigma}^{-1}(X-\mu)} d\mathbf{x} \\ &= \frac{1}{(2\pi)^{\frac{p}{2}} |\boldsymbol{\Sigma}|^{\frac{1}{2}}} \int\_{X} \mathbf{e}^{-\frac{1}{2}(1-2t)(X-\mu)'\boldsymbol{\Sigma}^{-1}(X-\mu)} d\mathbf{x} \\ &= (1-2t)^{-\frac{p}{2}} \text{ for } (1-2t) > 0. \end{split} \tag{3.4.8}$$

This is the mgf of a real chisquare random variable having *p* degrees of freedom. Hence we have the following result:

**Theorem 3.4.5.** *When X* ∼ *Np(μ, Σ), Σ > O,*

$$\mathbf{y} = (\mathbf{X} - \boldsymbol{\mu})^{\prime} \boldsymbol{\Sigma}^{-1} (\mathbf{X} - \boldsymbol{\mu}) \sim \chi\_p^2,\tag{3.4.9}$$

*and if y*<sup>1</sup> = *X*- *<sup>Σ</sup>*−1*X, then <sup>y</sup>*<sup>1</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *p(λ), that is, a real non-central chisquare with p degrees of freedom and noncentrality parameter <sup>λ</sup>* <sup>=</sup> <sup>1</sup> 2*μ*- *Σ*−1*μ.*

**Example 3.4.2.** Let *X* ∼ *N*3*(μ, Σ)* and consider the quadratic forms *u*<sup>1</sup> = *X*- *AX* and *u*<sup>2</sup> = *X*- *BX* where

$$\begin{aligned} X &= \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \; \Sigma = \frac{1}{3} \begin{bmatrix} 2 & 0 & -1 \\ 0 & 2 & -1 \\ -1 & -1 & 3 \end{bmatrix}, \; A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}, \\\ B &= \frac{1}{3} \begin{bmatrix} 2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2 \end{bmatrix}, \; \mu = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}. \end{aligned}$$

Show that *u*<sup>1</sup> and *u*<sup>2</sup> are independently distributed.

**Solution 3.4.2.** Let *J* be a 3 × 1 column vector of unities or 1's as its elements. Then observe that *A* = *J J* and *<sup>B</sup>* <sup>=</sup> *<sup>I</sup>* <sup>−</sup> <sup>1</sup> <sup>3</sup>*J J* - . Further, *J* - *J* = 3, *J* - *Σ* = *J* and hence *AΣ* = *J J* - *Σ* = *J J* - . Then *AΣB* = *J J* - [*<sup>I</sup>* <sup>−</sup> <sup>1</sup> <sup>3</sup>*J J* - ] = *J J* - − *J J* - = *O*. It then follows from Theorem 3.4.4 that *u*<sup>1</sup> and *u*<sup>2</sup> are independently distributed. Now, let us prove the result independently without resorting to Theorem 3.4.4. Note that *u*<sup>3</sup> = *x*<sup>1</sup> + *x*<sup>2</sup> + *x*<sup>3</sup> = *J* - *X* has a standard normal distribution as shown in Example 3.4.1. Consider the *<sup>B</sup>* in *BX*, namely *<sup>I</sup>* <sup>−</sup> <sup>1</sup> <sup>3</sup>*J J* - . The first component of *BX* is of the form <sup>1</sup> <sup>3</sup> [2*,* <sup>−</sup>1*,* <sup>−</sup>1]*<sup>X</sup>* <sup>=</sup> <sup>1</sup> <sup>3</sup> [2*x*<sup>1</sup> − *x*<sup>2</sup> − *x*3], which shall be denoted by *u*4. Then *u*<sup>3</sup> and *u*<sup>4</sup> are linear functions of the same real normal vector *X*, and hence *u*<sup>3</sup> and *u*<sup>4</sup> are real normal variables. Let us compute the covariance between *u*<sup>3</sup> and *u*4, observing that *J* - *Σ* = *J* - = *J* - Cov*(X)*:

$$\text{Cov}(\mu\_3, \mu\_4) = \frac{1}{3} \begin{bmatrix} 1, 1, 1 \end{bmatrix} \text{Cov}(X) \begin{bmatrix} 2 \\ -1 \\ -1 \end{bmatrix} = \frac{1}{3} \begin{bmatrix} 1, 1, 1 \end{bmatrix} \begin{bmatrix} 2 \\ -1 \\ -1 \end{bmatrix} = 0.1$$

Thus, *u*<sup>3</sup> and *u*<sup>4</sup> are independently distributed. As a similar result can be established with respect to the second and third component of *BX*, *u*<sup>3</sup> and *BX* are indeed independently distributed. This implies that *u*<sup>2</sup> <sup>3</sup> = *(J* - *X)*<sup>2</sup> <sup>=</sup> *<sup>X</sup>*- *J J* - *X* = *X*- *AX* and *(BX)*- *(BX)* = *X*- *B*- *BX* = *X*- *BX* are independently distributed. Observe that since *B* is symmetric and idempotent, *B*- *B* = *B*. This solution makes use of the following property: if *Y*<sup>1</sup> and *Y*<sup>2</sup> are real vectors or matrices that are independently distributed, then *Y* - <sup>1</sup>*Y*<sup>1</sup> and *Y* - <sup>2</sup>*Y*<sup>2</sup> are also independently distributed. It should be noted that the converse does not necessarily hold.

#### **3.4a. Chisquaredness and Independence in the Complex Gaussian Case**

Let the *p* × 1 vector *X*˜ in the complex domain have a *p*-variate complex Gaussian density *X*˜ ∼ *N*˜*p(O, I )*. Let *u*˜ = *X*˜ <sup>∗</sup>*AX*˜ be a Hermitian form, *A* = *A*<sup>∗</sup> where *A*<sup>∗</sup> denotes the conjugate transpose of *A*. Then there exists a unitary matrix *Q, QQ*<sup>∗</sup> = *I, Q*∗*Q* = *I* , such that *Q*∗*AQ* = diag*(λ*1*,...,λp)* where *λ*1*,...,λp* are the eigenvalues of *A*. It can be shown that when *A* is Hermitian, which means in the real case that *A* = *A*- (symmetric), all the eigenvalues of *A* are real. Let *Y*˜ = *Q*∗*X*˜ then

$$
\tilde{\mu} = \tilde{X}^\* A \tilde{X} = \tilde{Y}^\* Q^\* A Q \tilde{Y} = \lambda\_1 |\tilde{\mathbf{y}}\_1|^2 + \dots + \lambda\_p |\tilde{\mathbf{y}}\_p|^2 \tag{3.4a.1}
$$

where | ˜*yj* | denotes the absolute value or modulus of *y*˜*<sup>j</sup>* . If *y*˜*<sup>j</sup>* = *yj*<sup>1</sup> + *iyj*<sup>2</sup> where *yj*<sup>1</sup> and *yj*<sup>2</sup> are real, *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*, then | ˜*yj* <sup>|</sup> <sup>2</sup> <sup>=</sup> *<sup>y</sup>*<sup>2</sup> *<sup>j</sup>*<sup>1</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> *<sup>j</sup>*2. We can obtain the following result which is the counterpart of Theorem 3.4.1:

**Theorem 3.4a.1.** *Let <sup>X</sup>*˜ <sup>∼</sup> *<sup>N</sup>*˜*p(O, I ) and <sup>u</sup>*˜ <sup>=</sup> *<sup>X</sup>*˜ <sup>∗</sup>*AX, A* ˜ <sup>=</sup> *<sup>A</sup>*∗*. Then <sup>u</sup>*˜ ∼ ˜*χ*<sup>2</sup> *<sup>r</sup> , a chisquare random variable having r degrees of freedom in the complex domain, if and only if <sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> *(idempotent) and <sup>A</sup> is of rank <sup>r</sup>.*

*Proof***:** The definition of an idempotent matrix *<sup>A</sup>* as *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> holds whether the elements of *A* are real or complex. Let *A* be idempotent and of rank *r*. Then *r* of the eigenvalues of *A* are unities and the remaining ones are zeros. Then the representation given in (3.4*a*.1) becomes

$$
\tilde{\mu} = |\tilde{\mathbf{y}}\_1|^2 + \dots + |\tilde{\mathbf{y}}\_r|^2 \sim \tilde{\chi}\_r^2,
$$

a chisquare with *r* degrees of freedom in the complex domain, that is, a real gamma with the parameters *(α* <sup>=</sup> *r, β* <sup>=</sup> <sup>1</sup>*)* whose mgf is *(*<sup>1</sup> <sup>−</sup> *t)*−*r,* <sup>1</sup> <sup>−</sup> *t >* 0. For proving the necessity, let us assume that *<sup>u</sup>*˜ ∼ ˜*χ*<sup>2</sup> *<sup>r</sup>* , its mgf being *Mu*˜*(t)* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> *t)*−*<sup>r</sup>* for 1 <sup>−</sup> *t >* 0. But from (3.4*a*.1), | ˜*yj* | <sup>2</sup> ∼ ˜*χ*<sup>2</sup> <sup>1</sup> and its mgf is *(*<sup>1</sup> <sup>−</sup> *t)*−<sup>1</sup> for 1 <sup>−</sup> *t >* 0. Hence the mgf of *λj* | ˜*yj* <sup>|</sup> 2 is *Mλj* | ˜*yj* <sup>|</sup> <sup>2</sup> *(t)* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> *λj t)*−<sup>1</sup> for 1 <sup>−</sup> *λj t >* 0, and we have the following identity:

$$(1 - t)^{-r} = \prod\_{j=1}^{p} (1 - \lambda\_j t)^{-1}.\tag{3.4a.2}$$

Take the natural logarithm on both sides of (3.4*a*.2, expand and compare the coefficients of *t, <sup>t</sup>*<sup>2</sup> <sup>2</sup> *,...* to obtain

$$r = \sum\_{j=1}^{p} \lambda\_j = \sum\_{j=1}^{p} \lambda\_j^2 = \dotsb \tag{3.4a.3}$$

The only possibility for the *λj* 's in (3.4*a*.3) is that *r* of them are unities and the remaining ones, zeros. This property, combined with *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>∗</sup> guarantees that *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> and *<sup>A</sup>* is of rank *r*. This completes the proof.

An extension of Theorem 3.4a.1 which is the counterpart of Theorem 3.4.2 can also be obtained. We will simply state it as the proof is parallel to that provided in the real case.

**Theorem 3.4a.2.** *Let X*˜ ∼ *N*˜*p(O, Σ), Σ > O and u*˜ = *X*˜ <sup>∗</sup>*AX, A* ˜ = *A*∗*, be a Hermitian form. Then <sup>u</sup>*˜ ∼ ˜*χ*<sup>2</sup> *<sup>r</sup> , a chisquare random variable having r degrees of freedom in the complex domain, if and only if A* = *AΣA and A is of rank r.*

**Example 3.4a.1.** Let *X*˜ ∼ *N*˜3*(μ, Σ),* ˜ *u*˜ = *X*˜ <sup>∗</sup>*AX*˜ where

$$\begin{aligned} \tilde{X} &= \begin{bmatrix} \tilde{\chi}\_1 \\ \tilde{\chi}\_2 \\ \tilde{\chi}\_3 \end{bmatrix}, \quad \Sigma = \frac{1}{3} \begin{bmatrix} 3 & -(1+i) & -(1-i) \\ -(1-i) & 3 & -(1+i) \\ -(1+i) & -(1-i) & 3 \end{bmatrix}, \\\ A &= \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}, \; \tilde{\mu} = \begin{bmatrix} 2+i \\ -i \\ 2i \end{bmatrix}. \end{aligned}$$

First determine whether *Σ* can be a covariance matrix. Then determine the distribution of *u*˜ by making use of Theorem 3.4a.2 as well as independently, that is, without using Theorem 3.4a.2, for the cases (1) *μ*˜ = *O*; (2) *μ*˜ as given above.

**Solution 3.4a.1.** Note that *Σ* = *Σ*∗, that is, *Σ* is Hermitian. Let us verify that *Σ* is a Hermitian positive definite matrix. Note that *Σ* must be either positive definite or positive semi-definite to be a covariance matrix. In the semi-definite case, the density of *X*˜ does not exist. Let us check the leading minors: det*((*3*))* <sup>=</sup> <sup>3</sup> *<sup>&</sup>gt;* 0, det 3 −*(*1 + *i)* −*(*1 − *i)* 3 = <sup>9</sup> <sup>−</sup> <sup>2</sup> <sup>=</sup> <sup>7</sup> *<sup>&</sup>gt;* 0, det*(Σ)* <sup>=</sup> <sup>13</sup> <sup>33</sup> *>* 0 [evaluated by using the cofactor expansion which is the same in the complex case]. Hence *Σ* is Hermitian positive definite. In order to apply Theorem 3.4a.2, we must now verify that *AΣA* = *A* when *μ*˜ = *O*. Observe the following: *A* = *J J* - *, J* - *A* = 3*J* - *, J* - *J* = 3 where *J* - = [1*,* 1*,* 1]. Hence *AΣA* = *(J J* - *)Σ(J J* - *)* = *J (J* - *Σ)J J* - <sup>=</sup> <sup>1</sup> <sup>3</sup> *(J J* - *)(J J* - *)* <sup>=</sup> <sup>1</sup> <sup>3</sup>*J (J* - *J )J* - <sup>=</sup> <sup>1</sup> <sup>3</sup>*J (*3*)J* - = *J J* - = *A*. Thus the condition holds and by Theorem 3.4a.2, *<sup>u</sup>*˜ ∼ ˜*χ*<sup>2</sup> <sup>1</sup> in the complex domain, that is, *u*˜ a real gamma random variable with parameters *(α* = 1*, β* = 1*)* when *μ*˜ = *O*. Now, let us derive this result without using Theorem 3.4a.2. Let *u*˜<sup>1</sup> = ˜*x*<sup>1</sup> + ˜*x*<sup>2</sup> + ˜*x*<sup>3</sup> and *A*<sup>1</sup> = *(*1*,* 1*,* 1*)*- . Note that *A*- <sup>1</sup>*X*˜ = ˜*u*1, the sum of the components of *X*˜ . Hence *u*˜<sup>∗</sup> <sup>1</sup>*u*˜<sup>1</sup> = *X*˜ <sup>∗</sup>*A*1*A*- <sup>1</sup>*X*˜ = *X*˜ <sup>∗</sup>*AX*˜ . For *μ*˜ = *O*, we have *E*[ ˜*u*1] = 0 and

$$\begin{split} \text{Var}(\tilde{\mu}\_{1}) &= \text{Var}(\tilde{\boldsymbol{\chi}}\_{1}) + \text{Var}(\tilde{\boldsymbol{\chi}}\_{2}) + \text{Var}(\tilde{\boldsymbol{\chi}}\_{3}) + [\text{Cov}(\tilde{\boldsymbol{\chi}}\_{1}, \tilde{\boldsymbol{\chi}}\_{2}) + \text{Cov}(\tilde{\boldsymbol{\chi}}\_{2}, \tilde{\boldsymbol{\chi}}\_{1})] \\ &+ [\text{Cov}(\tilde{\boldsymbol{\chi}}\_{1}, \tilde{\boldsymbol{\chi}}\_{3}) + \text{Cov}(\tilde{\boldsymbol{\chi}}\_{3}, \tilde{\boldsymbol{\chi}}\_{1})] + [\text{Cov}(\tilde{\boldsymbol{\chi}}\_{2}, \tilde{\boldsymbol{\chi}}\_{3}) + \text{Cov}(\tilde{\boldsymbol{\chi}}\_{3}, \tilde{\boldsymbol{\chi}}\_{2})] \\ &= \frac{1}{3} [3 + 3 + 3 + [-(1 + i) - (1 - i)] + [-(1 - i) - (1 + i)] \\ &+ [-(1 + i) - (1 - i)]) = \frac{1}{3} [9 - 6] = 1. \end{split}$$

Thus, *u*˜<sup>1</sup> is a standard normal random variable in the complex domain and *u*˜<sup>∗</sup> <sup>1</sup>*u*˜<sup>1</sup> ∼ ˜*χ*<sup>2</sup> <sup>1</sup> , a chisquare random variable with one degree of freedom in the complex domain, that is, a real gamma random variable with parameters *(α* = 1*, β* = 1*)*.

For *μ*˜ = *(*2 + *i,* −*i,* 2*i)*- , this chisquare random variable is noncentral with noncentrality parameter *<sup>λ</sup>* = ˜*μ*∗*Σ*−1*μ*˜. Hence, the inverse of *<sup>Σ</sup>* has to be evaluated. To do so, we will employ the formula *<sup>Σ</sup>*−<sup>1</sup> <sup>=</sup> <sup>1</sup> |*Σ*| [Cof*(Σ)*] - , which also holds for the complex case. Earlier, the determinant was found to be equal to <sup>13</sup> <sup>33</sup> and

$$\frac{1}{|\Sigma|} [\text{Cof}(\Sigma)] = \frac{3^3}{13} \begin{bmatrix} 7 & 3-i & 3+i \\ 3+i & 7 & 3-i \\ 3-i & 3+i & 7 \end{bmatrix}; \text{ then}$$

$$\Sigma^{-1} = \frac{1}{|\Sigma|} [\text{Cof}(\Sigma)]' = \frac{3^3}{13} \begin{bmatrix} 7 & 3+i & 3-i \\ 3-i & 7 & 3+i \\ 3+i & 3-i & 7 \end{bmatrix}$$

and

$$\begin{split} \lambda &= \tilde{\mu}^{\*} \Sigma^{-1} \tilde{\mu} = \frac{3^{3}}{13} [2 - i, i, -2i] \begin{bmatrix} 7 & 3+i & 3-i \\ 3-i & 7 & 3+i \\ 3+i & 3-i & 7 \end{bmatrix} \begin{bmatrix} 2+i \\ -i \\ 2i \end{bmatrix} \\ &= \frac{(76)(3^{3})}{13} = \frac{2052}{13} \approx 157.85. \end{split}$$

This completes the computations.

#### **3.4a.1. Independence of Hermitian forms**

We shall mainly state certain results in connection with Hermitian forms in this section since they parallel those pertaining to the real case.

**Theorem 3.4a.3.** *Let u*˜<sup>1</sup> = *X*˜ <sup>∗</sup>*AX, A* ˜ = *A*∗*, and u*˜<sup>2</sup> = *X*˜ <sup>∗</sup>*BX, B* ˜ = *B*∗*, where X*˜ ∼ *N (μ, I )* ˜ *. Then, u*˜<sup>1</sup> *and u*˜<sup>2</sup> *are independently distributed if and only if AB* = *O.*

*Proof***:** Let us assume that *AB* = *O*. Then

$$AB = O = O^\* = (AB)^\* = B^\* A^\* = BA.\tag{3.4a.4}$$

This means that there exists a unitary matrix *Q, QQ*<sup>∗</sup> = *I, Q*∗*Q* = *I* , that will diagonalize both *A* and *B*. That is, *Q*∗*AQ* = diag*(λ*1*,...,λp)* = *D*1, *Q*∗*BQ* = diag*(ν*1*,...,νp)* = *D*<sup>2</sup> where *λ*1*,...,λp* are the eigenvalues of *A* and *ν*1*,...,νp* are the eigenvalues of *B*. But *AB* = *O* implies that *D*1*D*<sup>2</sup> = *O*. As well,

$$
\tilde{\mu}\_1 = \tilde{X}^\* A \tilde{X} = \tilde{Y}^\* Q^\* A Q \tilde{Y} = \lambda\_1 |\tilde{\mathbf{y}}\_1|^2 + \dots + \lambda\_p |\tilde{\mathbf{y}}\_p|^2,\tag{3.4a.5}
$$

$$
\tilde{\mu}\_2 = \tilde{X}^\* B \tilde{X} = \tilde{Y}^\* Q^\* B Q \tilde{Y} = \nu\_1 |\tilde{\mathbf{y}}\_1|^2 + \dots + \nu\_p |\tilde{\mathbf{y}}\_p|^2. \tag{3.4a.6}
$$

Since *D*1*D*<sup>2</sup> = *O*, whenever a *λj* = 0, the corresponding *νj* = 0 and vice versa. Thus the independent variables *y*˜*<sup>j</sup>* 's are separated in (3.4*a*.5) and (3.4*a*.6) and accordingly, *u*˜<sup>1</sup> and *u*˜<sup>2</sup> are independently distributed. The proof of the necessity which requires more matrix algebra, will not be provided herein. The general result can be stated as follows:

**Theorem 3.4a.4.** *Letting X*˜ ∼ *N*˜*p(μ, Σ), Σ > O, the Hermitian forms u*˜<sup>1</sup> = *X*˜ <sup>∗</sup>*AX,* ˜ *A* = *A*∗*, and u*˜<sup>2</sup> = *X*˜ <sup>∗</sup>*BX, B* ˜ = *B*∗*, are independently distributed if and only if AΣB* = *O.*

Now, consider the density of the exponent in the *p*-variate complex Gaussian density. What will then be the density of *<sup>y</sup>*˜ <sup>=</sup> *(X*˜ − ˜*μ)*∗*Σ*−1*(X*˜ − ˜*μ)*? Let us evaluate the mgf of *<sup>y</sup>*˜. Observing that *<sup>y</sup>*˜ is real so that we may take *<sup>E</sup>*[e*ty*˜ ] where *t* is a real parameter, we have

$$\begin{split} M\_{\tilde{\boldsymbol{\nu}}}(t) &= E[\mathbf{e}^{t\tilde{\boldsymbol{\nu}}}] = \frac{1}{\pi^{p}|\det(\boldsymbol{\Sigma})|} \int\_{\tilde{X}} \mathbf{e}^{(\tilde{X}-\tilde{\mu})^{\*}\boldsymbol{\Sigma}^{-1}(\tilde{X}-\tilde{\mu}) - (\tilde{X}-\tilde{\mu})^{\*}\boldsymbol{\Sigma}^{-1}(\tilde{X}-\tilde{\mu})} \mathbf{d}\tilde{X}} \\ &= \frac{1}{\pi^{p}|\det(\boldsymbol{\Sigma})|} \int\_{\tilde{X}} \mathbf{e}^{-(1-t)(\tilde{X}-\tilde{\mu})^{\*}\boldsymbol{\Sigma}^{-1}(\tilde{X}-\tilde{\mu})} \mathbf{d}\tilde{X} \\ &= (1-t)^{-p} \text{ for } 1-t > 0. \end{split} \tag{3.4a.7}$$

This is the mgf of a real gamma random variable with the parameters *(α* = *p, β* = 1*)* or a chisquare random variable in the complex domain with *p* degrees of freedom. Hence we have the following result:

**Theorem 3.4a.5.** *When <sup>X</sup>*˜ <sup>∼</sup> *<sup>N</sup>*˜*p(μ, Σ), Σ > O* ˜ *then <sup>y</sup>*˜ <sup>=</sup> *(X*˜ − ˜*μ)*∗*Σ*−1*(X*˜ − ˜*μ) is distributed as a real gamma random variable with the parameters (α* = *p, β* = 1*) or a chisquare random variable in the complex domain with p degrees of freedom, that is,*

$$
\tilde{\mathbf{y}} \sim \text{gamma}(\alpha = p, \beta = 1) \, or \, \tilde{\mathbf{y}} \sim \tilde{\chi}\_p^2. \tag{3.4a.8}
$$

**Example 3.4a.2.** Let *X*˜ ∼ *N*˜3*(μ, Σ),* ˜ *u*˜<sup>1</sup> = *X*˜ <sup>∗</sup>*AX,* ˜ *u*˜<sup>2</sup> = *X*˜ <sup>∗</sup>*BX*˜ where

$$\begin{aligned} \tilde{X} &= \begin{bmatrix} \tilde{x}\_1 \\ \tilde{x}\_2 \\ \tilde{x}\_3 \end{bmatrix}, \tilde{\mu} = \begin{bmatrix} 2 - i \\ 3 + 2i \\ 1 - i \end{bmatrix}, \Sigma = \frac{1}{3} \begin{bmatrix} 3 & -(1+i) & -(1-i) \\ -(1-i) & 3 & -(1+i) \\ -(1+i) & -(1-i) & 3 \end{bmatrix}, \\\ A &= \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}, \ B = \frac{1}{3} \begin{bmatrix} 2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2 \end{bmatrix}. \end{aligned}$$

(1) By making use of Theorem 3.4a.4, show that *u*˜<sup>1</sup> and *u*˜<sup>2</sup> are independently distributed. (2) Show the independence of *u*˜<sup>1</sup> and *u*˜<sup>2</sup> without using Theorem 3.4a.4.

**Solution 3.4a.2.** In order to use Theorem 3.4a.4, we have to show that *AΣB* = *O* irrespective of *μ*˜. Note that *A* = *J J* - *, J* - = [1*,* 1*,* 1]*, J* - *J* = 3*, J* - *<sup>Σ</sup>* <sup>=</sup> <sup>1</sup> <sup>3</sup>*J* - *, J* - *B* = *O*. Hence *AΣ* = *J J* - *Σ* = *J (J* - *Σ)* <sup>=</sup> <sup>1</sup> <sup>3</sup>*J J* - <sup>⇒</sup> *AΣB* <sup>=</sup> <sup>1</sup> <sup>3</sup>*J J* - *<sup>B</sup>* <sup>=</sup> <sup>1</sup> <sup>3</sup>*J (J* - *B)* = *O.* This proves the result that *u*˜<sup>1</sup> and *u*˜<sup>2</sup> are independently distributed through Theorem 3.4a.4. This will now be established without resorting to Theorem 3.4a.4. Let *u*˜<sup>3</sup> = ˜*x*<sup>1</sup> + ˜*x*<sup>2</sup> + ˜*x*<sup>3</sup> = *J* - *<sup>X</sup>* and *<sup>u</sup>*˜<sup>4</sup> <sup>=</sup> <sup>1</sup> <sup>3</sup> [2*x*˜<sup>1</sup> − ˜*x*<sup>2</sup> − ˜*x*3] or the first row of *BX*˜ . Since independence is not affected by the relocation of the variables, we may assume, without any loss of generality, that *μ*˜ = *O* when considering the independence of *u*˜<sup>3</sup> and *u*˜4. Let us compute the covariance between *u*˜<sup>3</sup> and *u*˜4:

The Multivariate Gaussian and Related Distributions

$$\text{Cov}(\tilde{\mu}\_3, \tilde{\mu}\_4) = \frac{1}{3} [1, 1, 1] \\ \Sigma \begin{bmatrix} 2 \\ -1 \\ -1 \end{bmatrix} = \frac{1}{3^2} [1, 1, 1] \begin{bmatrix} 2 \\ -1 \\ -1 \end{bmatrix} = 0.1$$

Thus, *u*˜<sup>3</sup> and *u*˜<sup>4</sup> are uncorrelated and hence independently distributed since both are linear functions of the normal vector *X*˜ . This property holds for each row of *BX*˜ and therefore *u*˜<sup>3</sup> and *BX*˜ are independently distributed. However, *u*˜<sup>1</sup> = *X*˜ <sup>∗</sup>*AX*˜ = ˜*u*<sup>∗</sup> <sup>3</sup>*u*˜<sup>3</sup> and hence *u*˜<sup>1</sup> and *(BX)*˜ <sup>∗</sup>*(BX)*˜ = *X*˜ <sup>∗</sup>*B*∗*BX*˜ = *X*˜ <sup>∗</sup>*BX*˜ = ˜*u*<sup>2</sup> are independently distributed. This completes the computations. The following property was utilized: Let *U*˜ and *V*˜ be vectors or matrices that are independently distributed. Then, all the pairs *(U ,* ˜ *V*˜ <sup>∗</sup>*), (U ,* ˜ *V*˜ *V*˜ <sup>∗</sup>*), (U ,* ˜ *V*˜ <sup>∗</sup>*V ), . . . , (* ˜ *U*˜ *U*˜ <sup>∗</sup>*, V*˜ *V*˜ <sup>∗</sup>*)*, are independently distributed whenever the quantities are defined. The converses need not hold when quadratic terms are involved; for instance, *(U*˜ *U*˜ <sup>∗</sup>*, V*˜ *V*˜ <sup>∗</sup>*)* being independently distributed need not imply that *(U ,* ˜ *V )*˜ are independently distributed.

#### **Exercises 3.4**

**3.4.1.** In the real case on the right side of (3.4.4), compute the densities of the following items: (i) *z*<sup>2</sup> <sup>1</sup>, (ii) *<sup>λ</sup>*1*z*<sup>2</sup> <sup>1</sup>, (iii) *<sup>λ</sup>*1*z*<sup>2</sup> <sup>1</sup> <sup>+</sup> *<sup>λ</sup>*2*z*<sup>2</sup> <sup>2</sup>, (iv) *<sup>λ</sup>*1*z*<sup>2</sup> <sup>1</sup> +···+ *<sup>λ</sup>*4*z*<sup>2</sup> <sup>4</sup> if *λ*<sup>1</sup> = *λ*2*, λ*<sup>3</sup> = *λ*<sup>4</sup> for *μ* = *O*.

**3.4.2.** Compute the density of *u* = *X*- *AX, A* = *A* in the real case when (i) *X* ∼ *Np(O, Σ), Σ > O*, (ii) *X* ∼ *Np(μ, Σ), Σ > O*.

**3.4.3.** Modify the statement in Theorem 3.4.1 if (i) *<sup>X</sup>* <sup>∼</sup> *Np(O, σ*<sup>2</sup>*I ), σ*<sup>2</sup> *<sup>&</sup>gt;* 0, (ii) *<sup>X</sup>* <sup>∼</sup> *Np(μ, σ*<sup>2</sup>*I ), μ* = *O*.

**3.4.4.** Prove the only if part in Theorem 3.4.3

**3.4.5.** Establish the cases (i), (ii), (iii) of Exercise 3.4.1 in the corresponding complex domain.

**3.4.6.** Supply the proof for the only if part in Theorem 3.4a.3.

**3.4.7.** Can a matrix *A* having at least one complex element be Hermitian and idempotent at the same time? Prove your statement.

**3.4.8.** Let the *p* × 1 vector *X* have a real Gaussian density *Np(O, Σ), Σ > O*. Let *u* = *X*- *AX, A* = *A*- . Evaluate the density of *u* for *p* = 2 and show that this density can be written in terms of a hypergeometric series of the <sup>1</sup>*F*<sup>1</sup> type.

**3.4.9.** Repeat Exercise 3.4.8 if *X*˜ is in the complex domain, *X*˜ ∼ *N*˜*p(O, Σ), Σ > O*.

**3.4.10.** Supply the proofs for the only if part in Theorems 3.4.4 and 3.4a.4.

#### **3.5. Samples from a** *p***-variate Real Gaussian Population**

Let the *p*×1 real vectors *X*1*,...,Xn* be iid as *Np(μ, Σ), Σ > O*. Then, the collection *X*1*,...,Xn* is called a simple random sample of size *n* from this *Np(μ, Σ), Σ > O*. Then the joint density of *X*1*,...,Xn* is the following:

$$\begin{split} L &= \prod\_{j=1}^{n} f(X\_j) = \prod\_{j=1}^{n} \frac{\mathbf{e}^{-\frac{1}{2}(X\_j - \mu)' \Sigma^{-1} (X\_j - \mu)}}{(2\pi)^{\frac{p}{2}} |\Sigma|^{\frac{1}{2}}} \\ &= [(2\pi)^{\frac{np}{2}} |\Sigma|^{\frac{n}{2}}]^{-1} \mathbf{e}^{-\frac{1}{2} \sum\_{j=1}^{n} (X\_j - \mu)' \Sigma^{-1} (X\_j - \mu)}. \end{split} \tag{3.5.1}$$

This *L* at an observed set of *X*1*,...,Xn* is called the *likelihood function*. Let the sample matrix, which is *p* ×*n*, be denoted by a bold-faced **X**. In order to avoid too many symbols, we will use **X** to denote the *p* × *n* matrix in this section. In earlier sections, we had used *X* to denote a *p* × 1 vector. Then

$$\mathbf{X} = [X\_1, \dots, X\_n] = \begin{bmatrix} x\_{11} & x\_{12} & \dots & x\_{1n} \\ x\_{21} & x\_{22} & \dots & x\_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x\_{p1} & x\_{p2} & \dots & x\_{pn} \end{bmatrix}, \ X\_k = \begin{bmatrix} x\_{1k} \\ x\_{2k} \\ \vdots \\ x\_{pk} \end{bmatrix}, \ k = 1, \dots, n. \qquad (i)$$

Let the sample average be denoted by *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn)*. Then *X*¯ will be of the following form:

$$\bar{X} = \begin{bmatrix} \bar{x}\_1 \\ \vdots \\ \bar{x}\_p \end{bmatrix}, \ \bar{x}\_l = \frac{1}{n} \sum\_{k=1}^n x\_{lk} = \text{average on the } i\text{-th component of any } X\_j. \tag{ii}$$

Let the bold-faced **X**¯ be defined as follows:

$$
\bar{\mathbf{X}} = [\bar{X}, \dots, \bar{X}] = \begin{bmatrix}
\bar{x}\_1 & \bar{x}\_1 & \dots & \bar{x}\_1 \\
\bar{x}\_2 & \bar{x}\_2 & \dots & \bar{x}\_2 \\
\vdots & \vdots & \ddots & \vdots \\
\bar{x}\_p & \bar{x}\_p & \dots & \bar{x}\_p
\end{bmatrix}.
$$

Then,

$$\mathbf{X} - \bar{\mathbf{X}} = \begin{bmatrix} x\_{11} - \bar{x}\_1 & x\_{12} - \bar{x}\_1 & \dots & x\_{1n} - \bar{x}\_1 \\ x\_{21} - \bar{x}\_2 & x\_{22} - \bar{x}\_2 & \dots & x\_{2n} - \bar{x}\_2 \\ \vdots & \vdots & \ddots & \vdots \\ x\_{p1} - \bar{x}\_p & x\_{p2} - \bar{x}\_p & \dots & x\_{pn} - \bar{x}\_p \end{bmatrix},$$

and

$$S = (\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})' = (\mathbf{s}\_{ij}),$$

so that

$$s\_{ij} = \sum\_{k=1}^{n} (\mathbf{x}\_{ik} - \bar{\mathbf{x}}\_{l})(\mathbf{x}\_{jk} - \bar{\mathbf{x}}\_{j}).$$

*S* is called the *sample sum of products matrix* or *the corrected sample sum of products matrix*, corrected in the sense that the averages are deducted from the observations. As well, <sup>1</sup> *<sup>n</sup>sii* is called the sample variance on the component *xi* of any vector *Xk*, referring to *(i)* above, and <sup>1</sup> *<sup>n</sup>sij , i* = *j,* is called the sample covariance on the components *xi* and *xj* of any *Xk*, <sup>1</sup> *<sup>n</sup>S* being referred to as the sample covariance matrix. The exponent in *L* can be simplified by making use of the following properties: (1) When *u* is a 1 × 1 matrix or a scalar quantity, then tr*(u)* = tr*(u*- *)* = *u* = *u*- . (2) For two matrices *A* and *B*, whenever *AB* and *BA* are defined, tr*(AB)* = tr*(BA)* where *AB* need not be equal to *BA*. Observe that the following quantity is real scalar and hence, it is equal to its trace:

$$\begin{split} \sum\_{j=1}^{n} (X\_j - \mu)' \Sigma^{-1} (X\_j - \mu) &= \text{tr} \Big[ \sum\_{j=1}^{n} (X\_j - \mu)' \Sigma^{-1} (X\_j - \mu) \Big] \\ &= \text{tr} [\Sigma^{-1} \sum\_{j=1}^{n} (X\_j - \mu)(X\_j - \mu)'] \\ &= \text{tr} [\Sigma^{-1} \sum\_{j=1}^{n} (X\_j - \bar{X} + \bar{X} - \mu)(X\_j - \bar{X} + \bar{X} - \mu)'] \\ &= \text{tr} [\Sigma^{-1} (\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})'] + n \text{tr} [\Sigma^{-1} (\bar{X} - \mu)(\bar{X} - \mu)'] \\ &= \text{tr} (\Sigma^{-1} \mathbf{S}) + n(\bar{X} - \mu)' \Sigma^{-1} (\bar{X} - \mu) \end{split}$$

because

$$\text{tr}(\boldsymbol{\Sigma}^{-1}(\bar{X}-\mu)(\bar{X}-\mu)') = \text{tr}((\bar{X}-\mu)'\boldsymbol{\Sigma}^{-1}(\bar{X}-\mu) = (\bar{X}-\mu)'\boldsymbol{\Sigma}^{-1}(\bar{X}-\mu).$$

The right-hand side expression being 1 × 1, it is equal to its trace, and *L* can be written as

$$L = \frac{1}{(2\pi)^{\frac{np}{2}} |\boldsymbol{\Sigma}|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\boldsymbol{\Sigma}^{-1} \boldsymbol{S}) - \frac{n}{2} (\bar{X} - \mu)' \boldsymbol{\Sigma}^{-1} (\bar{X} - \mu)}. \tag{3.5.2}$$

If we wish to estimate the parameters *μ* and *Σ* from a set of observation vectors corresponding to *X*1*,...,Xn*, one method consists in maximizing *L* with respect to *μ* and *Σ* given those observations and estimating the parameters. By resorting to calculus, *L* is differentiated partially with respect to *μ* and *Σ*, the resulting expressions are equated to null vectors and matrices, respectively, and these equations are then solved to obtain the solutions for *μ* and *Σ*. Those estimates will be called *the maximum likelihood estimates or MLE's.* We will explore this aspect later.

**Example 3.5.1.** Let the 3×1 vector *X*<sup>1</sup> be real Gaussian, *X*<sup>1</sup> ∼ *N*3*(μ, Σ), Σ > O*. Let *Xj , j* = 1*,* 2*,* 3*,* 4 be iid as *X*1. Compute the 3 × 4 sample matrix **X**, the sample average *X*¯ , the matrix of sample means **X**¯ , the sample sum of products matrix *S*, the maximum likelihood estimates of *μ* and *Σ*, based on the following set of observations on *Xj , j* = 1*,* 2*,* 3*,* 4:

$$X\_1 = \begin{bmatrix} 2 \\ 0 \\ -1 \end{bmatrix}, \ X\_2 = \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}, \ X\_3 = \begin{bmatrix} 1 \\ 0 \\ 4 \end{bmatrix}, \ X\_4 = \begin{bmatrix} 0 \\ 1 \\ 3 \end{bmatrix}.$$

**Solution 3.5.1.** The 3 × 4 sample matrix and the sample average are

$$\mathbf{X} = \begin{bmatrix} 2 & 1 & 1 & 0 \\ 0 & -1 & 0 & 1 \\ -1 & 2 & 4 & 3 \end{bmatrix}, \ \bar{X} = \frac{1}{4} \begin{bmatrix} 2+1+1+0 \\ 0-1+0+1 \\ -1+2+4+3 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ 2 \end{bmatrix}.$$

Then **X**¯ and **X** − **X**¯ are the following:

$$
\bar{\mathbf{X}} = [\bar{X}, \bar{X}, \bar{X}, \bar{X}] = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 2 & 2 & 2 & 2 \end{bmatrix}, \; \mathbf{X} - \bar{\mathbf{X}} = \begin{bmatrix} 1 & 0 & 0 & -1 \\ 0 & -1 & 0 & 1 \\ -3 & 0 & 2 & 1 \end{bmatrix},
$$

and the sample sum of products matrix *S* is the following:

$$\mathbf{S} = [\mathbf{X} - \bar{\mathbf{X}}][\mathbf{X} - \bar{\mathbf{X}}]' = \begin{bmatrix} 1 & 0 & 0 & -1 \\ 0 & -1 & 0 & 1 \\ -3 & 0 & 2 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & -3 \\ 0 & -1 & 0 \\ 0 & 0 & 2 \\ -1 & 1 & 1 \end{bmatrix} = \begin{bmatrix} 2 & -1 & -4 \\ -1 & 2 & 1 \\ -4 & 1 & 14 \end{bmatrix}.$$

Then, the maximum likelihood estimates of *μ* and *Σ*, denoted with a hat, are

$$
\hat{\mu} = \bar{X} = \begin{bmatrix} 1 \\ 0 \\ 2 \end{bmatrix}, \ \hat{\Sigma} = \frac{1}{n} S = \frac{1}{4} \begin{bmatrix} 2 & -1 & -4 \\ -1 & 2 & 1 \\ -4 & 1 & 14 \end{bmatrix}.
$$

This completes the computations.

#### **3.5a. Simple Random Sample from a** *p***-variate Complex Gaussian Population**

Our population density is given by the following:

$$\tilde{f}(\tilde{X}\_j) = \frac{\mathbf{e}^{-(\tilde{X}\_j - \tilde{\mu})^\* \tilde{\Sigma}^{-1} (\tilde{X}\_j - \tilde{\mu})}}{\pi^p |\det(\Sigma)|}, \ \tilde{X}\_j \sim \tilde{N}\_p(\tilde{\mu}, \tilde{\Sigma}), \ \tilde{\Sigma} = \tilde{\Sigma}^\* > O.$$

Let *X*˜ <sup>1</sup>*,..., X*˜ *<sup>n</sup>* be a collection of complex vector random variables iid as *X*˜ *<sup>j</sup>* ∼ *N*˜*p(μ,*˜ *Σ),* ˜ *Σ>O* ˜ . This collection is called a simple random sample of size *n* from this complex Gaussian population *f (*˜ *X*˜ *<sup>j</sup> )*. We will use notations parallel to those utilized in the real case. Let **<sup>X</sup>**˜ = [*X*˜ <sup>1</sup>*,..., <sup>X</sup>*˜ *<sup>n</sup>*], ¯ *<sup>X</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*˜ <sup>1</sup> +···+ *<sup>X</sup>*˜ *n)*, ¯ **<sup>X</sup>**˜ <sup>=</sup> *(* ¯ *X, . . . ,* ˜ ¯ *X),* ˜ and *<sup>S</sup>*˜ <sup>=</sup> *(***X**˜ <sup>−</sup> ¯ **<sup>X</sup>**˜ *)(***X**˜ <sup>−</sup> ¯ **X**˜ *)*<sup>∗</sup> = *S*˜ = *(s*˜*ij )*. Then

$$\tilde{s}\_{ij} = \sum\_{k=1}^{n} (\tilde{\mathbf{x}}\_{ik} - \bar{\tilde{\mathbf{x}}}\_{i}) (\tilde{\mathbf{x}}\_{jk} - \bar{\tilde{\mathbf{x}}}\_{j})^\*$$

with <sup>1</sup> *<sup>n</sup>s*˜*ij* being the sample covariance between the components *x*˜*<sup>i</sup>* and *x*˜*<sup>j</sup>* , *i* = *j* , of any *<sup>X</sup>*˜ *k, k* <sup>=</sup> <sup>1</sup>*,...,n*, <sup>1</sup> *<sup>n</sup>s*˜*ii* being the sample variance on the component *x*˜*i*. The joint density of *X*˜ <sup>1</sup>*,..., X*˜ *<sup>n</sup>*, denoted by *L*˜, is given by

$$\tilde{L} = \prod\_{j=1}^{n} \frac{\mathbf{e}^{-(\tilde{X}\_j - \mu)^\* \tilde{\Sigma}^{-1}(\tilde{X}\_j - \mu)}}{\pi^p |\det(\tilde{\Sigma})|} = \frac{\mathbf{e}^{-\sum\_{j=1}^{n} (\tilde{X}\_j - \mu)^\* \tilde{\Sigma}^{-1}(\tilde{X}\_j - \mu)}}{\pi^{np} |\det(\tilde{\Sigma})|^n},\tag{3.5a.1}$$

which can be simplified to the following expression by making use of steps parallel to those utilized in the real case:

$$L = \frac{\mathbf{e}^{-\text{tr}(\tilde{\Sigma}^{-1}\tilde{\mathbf{S}}) - n(\tilde{\bar{\mathbf{X}}} - \mu)^{\*}\tilde{\Sigma}^{-1}(\tilde{\bar{\mathbf{X}}} - \mu)}{\pi^{np}|\text{det}(\tilde{\Sigma})|^{n}}. \tag{3.5a.2}$$

**Example 3.5a.1.** Let the 3×1 vector *X*˜ <sup>1</sup> in the complex domain have a complex trivariate Gaussian distribution *X*˜ <sup>1</sup> ∼ *N*˜3*(μ,*˜ *Σ),* ˜ *Σ>O* ˜ . Let *X*˜ *<sup>j</sup> , j* = 1*,* 2*,* 3*,* 4 be iid as *X*˜ 1. With our usual notations, compute the 3 <sup>×</sup> 4 sample matrix **<sup>X</sup>**˜ , the sample average ¯ *X*˜ , the 3 <sup>×</sup> 4 matrix of sample averages ¯ **X**˜ , the sample sum of products matrix *S*˜ and the maximum likelihood estimates of *μ*˜ and *Σ*˜ based on the following set of observations on *X*˜ *<sup>j</sup> , j* = 1*,* 2*,* 3*,* 4:

$$
\tilde{X}\_1 = \begin{bmatrix} 1+i \\ 2-i \\ 1-i \end{bmatrix}, \tilde{X}\_2 = \begin{bmatrix} -1+2i \\ 3i \\ -1+i \end{bmatrix}, \tilde{X}\_3 = \begin{bmatrix} -2+2i \\ 3+i \\ 4+2i \end{bmatrix}, \tilde{X}\_4 = \begin{bmatrix} -2+3i \\ 3+i \\ -4+2i \end{bmatrix}.
$$

**Solution 3.5a.1.** The sample matrix *X*˜ and the sample average ¯ *X*˜ are

$$
\tilde{\mathbf{X}} = \begin{bmatrix} 1+i & -1+2i & -2+2i & -2+3i \\ 2-i & 3i & 3+i & 3+i \\ 1-i & -1+i & 4+2i & -4+2i \end{bmatrix}, \\
\tilde{\boldsymbol{X}} = \begin{bmatrix} -1+2i \\ 2+i \\ i \end{bmatrix}.
$$

Then, with our usual notations, ¯ **<sup>X</sup>**˜ and **<sup>X</sup>**˜ <sup>−</sup> ¯ **X**˜ are the following:

$$
\begin{aligned}
\bar{\tilde{\mathbf{X}}} &= \begin{bmatrix}
2+i & 2+i & 2+i & 2+i \\
i & i & i & i
\end{bmatrix}, \\
\tilde{\mathbf{X}} - \bar{\tilde{\mathbf{X}}} &= \begin{bmatrix}
2-i & 0 & -1 & -1+i \\
1-2i & -1 & 4+i & -4+i
\end{bmatrix}.
\end{aligned}
$$

Thus, the sample sum of products matrix *S*˜ is

$$\begin{split} \tilde{S} &= [\tilde{\mathbf{X}} - \tilde{\mathbf{X}}][\tilde{\mathbf{X}} - \tilde{\mathbf{X}}]^{\*} \\ &= \begin{bmatrix} 2 - i & 0 & -1 & -1 + i \\ -2i & -2 + 2i & 1 & 1 \\ 1 - 2i & -1 & 4 + i & -4 + i \end{bmatrix} \begin{bmatrix} 2 + i & 2i & 1 + 2i \\ 0 & -2 - 2i & -1 \\ -1 & 1 & 4 - i \\ -1 - i & 1 & -4 - i \end{bmatrix} \\ &= \begin{bmatrix} 8 & 5i & 5 + i \\ -5i & 14 & 6 - 6i \\ 5 - i & 6 + 6i & 40 \end{bmatrix} .\end{split}$$

The maximum likelihood estimates are as follows:

$$
\hat{\tilde{\mu}} = \bar{\tilde{X}} = \begin{bmatrix} -1 + 2i \\ 2 + i \\ i \end{bmatrix}, \quad \hat{\tilde{\Sigma}} = \frac{1}{4}\tilde{S}
$$

where *S*˜ is given above. This completes the computations.

#### **3.5.1. Some simplifications of the sample matrix in the real Gaussian case**

The *p* × *n* sample matrix is

$$\mathbf{X} = [X\_1, \dots, X\_n] = \begin{bmatrix} x\_{11} & x\_{12} & \dots & x\_{1n} \\ x\_{21} & x\_{22} & \dots & x\_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x\_{p1} & x\_{p2} & \dots & x\_{pn} \end{bmatrix}, \ X\_k = \begin{bmatrix} x\_{1k} \\ x\_{2k} \\ \vdots \\ x\_{pk} \end{bmatrix}$$

where the rows are iid variables on the components of the *p*-vector *X*1. For example, *(x*11*, x*12*,...,x*1*n)* are iid variables distributed as the first component of *X*1. Let

$$\begin{aligned} \bar{X} &= \begin{bmatrix} \bar{X}\_1 \\ \bar{X}\_2 \\ \vdots \\ \bar{X}\_p \end{bmatrix} = \begin{bmatrix} \frac{1}{n} \sum\_{k=1}^n x\_{1k} \\ \vdots \\ \frac{1}{n} \sum\_{k=1}^n x\_{pk} \end{bmatrix} \\ &= \begin{bmatrix} \frac{1}{n} (\mathbf{x}\_{11}, \dots, \mathbf{x}\_{1n}) J \\ \vdots \\ \frac{1}{n} (\mathbf{x}\_{p1}, \dots, \mathbf{x}\_{pn}) J \end{bmatrix} = \frac{1}{n} \mathbf{X} J, \ J = \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}, \ n \times 1. \end{aligned}$$

Consider the matrix

$$\bar{\mathbf{X}} = (\bar{X}, \dots, \bar{X}) = \frac{1}{n} \mathbf{X}JJ' = \mathbf{X}B, B = \frac{1}{n}JJ'.$$

Then,

$$\mathbf{X} - \bar{\mathbf{X}} = \mathbf{X}A, \ A = I - B = I - \frac{1}{n}JJ'.$$

Observe that *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*2*, B* <sup>=</sup> *<sup>B</sup>*2*, AB* <sup>=</sup> *O, A* <sup>=</sup> *<sup>A</sup>* and *B* = *B* where both *A* and *B* are *n* × *n* matrices. Then **X***A* and **X***B* are *p* × *n* and, in order to determine the mgf, we will take the *p* × *n* parameter matrices *T*<sup>1</sup> and *T*2. Accordingly, the mgf of **X***A* is *<sup>M</sup>***X***A(T*1*)* <sup>=</sup> *<sup>E</sup>*[etr*(T* - <sup>1</sup>**X***A)*]*,* that of **<sup>X</sup>***<sup>B</sup>* is *<sup>M</sup>***X***B(T*2*)* <sup>=</sup> *<sup>E</sup>*[etr*(T* - <sup>2</sup>**X***B)*] and the joint mgf is *<sup>E</sup>*[<sup>e</sup> *tr(T* - <sup>1</sup>**X***A)*+tr*(T* - <sup>2</sup>**X***B)*]. Let us evaluate the joint mgf for *Xj* <sup>∼</sup> *Np(O, I )*,

$$E[\mathbf{e}^{\mathrm{tr}(T\_1'\mathbf{X}A) + \mathrm{tr}(T\_2'\mathbf{X}B)}] = \int\_{\mathbf{X}} \frac{1}{(2\pi)^{\frac{np}{2}} |\boldsymbol{\Sigma}|^{\frac{n}{2}}} \mathrm{e}^{\mathrm{tr}(T\_1'\mathbf{X}A) + \mathrm{tr}(T\_2'\mathbf{X}B) - \frac{1}{2}\mathrm{tr}(\mathbf{X}\mathbf{X}')} \mathrm{d}\mathbf{X}.$$

Let us simplify the exponent,

$$-\frac{1}{2}\{\text{tr}(\mathbf{X}\mathbf{X}') - 2\text{tr}[\mathbf{X}(AT\_1' + BT\_2')]\}.\tag{i}$$

If we expand tr[*(***X** − *C)(***X** − *C)*- ] for some *C*, we have

$$\begin{aligned} \text{tr}(\mathbf{X}\mathbf{X}') - \text{tr}(C\mathbf{X}') - \text{tr}(\mathbf{X}C') + \text{tr}(CC') \\ = \text{tr}(\mathbf{X}\mathbf{X}') - 2\text{tr}(\mathbf{X}C') + \text{tr}(CC') \end{aligned} \tag{ii}$$

as tr*(***X***C*- *)* = tr*(C***X**- *)* even though *C***X**- = **X***C*- . On comparing *(i)* and *(ii)*, we have *C*- = *AT* - <sup>1</sup> + *BT* - <sup>2</sup>, and then

$$\begin{split} \text{tr}(CC') &= \text{tr}[(T\_1A' + T\_2B')(AT\_1' + BT\_2')] \\ &= \text{tr}(T\_1A'AT\_1') + \text{tr}(T\_2B'BT\_2') + \text{tr}(T\_1A'BT\_2') + \text{tr}(T\_2B'AT\_1'). \end{split} \tag{iii}$$

Since the integral over **X** − *C* will absorb the normalizing constant and give 1, the joint mgf is etr*(CC*- *)* . Proceeding exactly the same way, it is seen that the mgf of **X***A* and **X***B* are respectively

$$M\_{\mathbf{X}A}(T\_{\mathbf{l}}) = \mathbf{e}^{\frac{1}{2}\text{tr}(T\_{\mathbf{l}}A'AT\_{\mathbf{l}}')} \text{ and } \; M\_{\mathbf{X}B}(T\_{\mathbf{2}}) = \mathbf{e}^{\frac{1}{2}\text{tr}(T\_{\mathbf{2}}B'BT\_{\mathbf{2}}')}.$$

The independence of **X***A* and **X***B* implies that the joint mgf should be equal to the product of the individual mgf's. In this instance, this is the case as *A*- *B* = *O, B*- *A* = *O*. Hence, the following result:

**Theorem 3.5.1.** *Assuming that X*1*,...,Xn are iid as Xj* ∼ *Np(O, I ), let the p* × *n matrix* **<sup>X</sup>** <sup>=</sup> *(X*1*,...,Xn) and <sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>***X***J, J* - = *(*1*,* 1*, ..,* 1*). Let* **X**¯ = **X***B and* **X**−**X**¯ = **X***A so that A* = *A*- *, B* = *B*- *, A*<sup>2</sup> <sup>=</sup> *A, B*<sup>2</sup> <sup>=</sup> *B, AB* <sup>=</sup> *<sup>O</sup>. Letting <sup>U</sup>*<sup>1</sup> <sup>=</sup> **<sup>X</sup>***<sup>B</sup> and <sup>U</sup>*<sup>2</sup> <sup>=</sup> **<sup>X</sup>***A, it follows that U*<sup>1</sup> *and U*<sup>2</sup> *are independently distributed.*

Now, appealing to a general result to the effect that if *U* and *V* are independently distributed then *U* and *V V* as well as *U* and *V* - *V* are independently distributed whenever *V V* and *V* - *V* are defined, the next result follows.

**Theorem 3.5.2.** *For the p* × *n matrix* **X***, let* **X***A and* **X***B be as defined in Theorem 3.5.1. Then* **X***B and* **X***AA*- **X**- = **X***A***X**- = *S are independently distributed and, consequently, the sample mean X*¯ *and the sample sum of products matrix S are independently distributed.*

As *μ* is absent from the previous derivations, the results hold for a *Np(μ, I )* population. If the population is *Np(μ, Σ), Σ > O*, it suffices to make the transformation *Yj* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>*Xj* or **<sup>Y</sup>** <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>**X**, in which case **X** = *Σ* 1 <sup>2</sup>**Y**. Then, tr*(T* - <sup>1</sup>**X***A)* = tr*(T* - 1*Σ* 1 <sup>2</sup>**Y***A)* = tr[*(T* - 1*Σ* 1 <sup>2</sup> *)***Y***A*] so that *Σ* 1 <sup>2</sup> is combined with *T* - <sup>1</sup>, which does not affect **Y***A*. Thus, we have the general result that is stated next.

**Theorem 3.5.3.** *Letting the population be Np(μ, Σ), Σ > O, and* **X***, A, B, S, and X*¯ *be as defined in Theorem 3.5.1, it then follows that U*<sup>1</sup> = **X***A and U*<sup>2</sup> = **X***B are independently distributed and thereby, that the sample mean X*¯ *and the sample sum of products matrix S are independently distributed.*

#### **3.5.2. Linear functions of the sample vectors**

Let the *Xj* 's, *j* = 1*, . . . , n,* be iid as *Xj* ∼ *Np(μ, Σ), Σ > O*. Let us consider a linear function *a*1*X*1+*a*2*X*2+···+*anXn* where *a*1*,...,an* are real scalar constants. Then the mgf's of *Xj , ajXj , U* <sup>=</sup> *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *ajXj* are obtained as follows:

The Multivariate Gaussian and Related Distributions

$$\begin{aligned} M\_{X\_j}(T) &= E[\mathbf{e}^{T'X\_j}] = \mathbf{e}^{T'\mu + \frac{1}{2}T'\Sigma T}, \quad M\_{a\_jX\_j}(T) = \mathbf{e}^{T'(a\_j\mu) + \frac{1}{2}a\_j^2T'\Sigma T}, \\ M\_{\sum\_{j=1}^n a\_jX\_j}(T) &= \prod\_{j=1}^n M\_{a\_jX\_j}(T) = \mathbf{e}^{T'\mu(\sum\_{j=1}^n a\_j) + \frac{1}{2}(\sum\_{j=1}^n a\_j^2)T'\Sigma T}, \end{aligned}$$

which implies that *<sup>U</sup>* <sup>=</sup> *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *ajXj* is distributed as a real normal vector random variable with parameters *( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj )μ* and *( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup> )Σ*, that is, *U* ∼ *Np(μ(<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj ), (<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup> )Σ)*. Thus, the following result:

**Theorem 3.5.4.** *Let the Xj 's be iid Np(μ, Σ), Σ > O, j* = 1*,...,n, and U* = *a*1*X*<sup>1</sup> +···+ *anXn be a linear function of the Xj 's, j* = 1*, . . . , n, where a*1*,...,an are real scalar constants. Then U is distributed as a p-variate real Gaussian vector random variable with parameters* [*( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj )μ, (<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup> )Σ*]*, that is, <sup>U</sup>* <sup>∼</sup> *Np((<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj )μ, ( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup> )Σ), Σ > O.*

If, in Theorem 3.5.4, *aj* <sup>=</sup> <sup>1</sup> *<sup>n</sup>*, *<sup>j</sup>* <sup>=</sup> <sup>1</sup>*, . . . , n,* then *<sup>n</sup> <sup>j</sup> aj* <sup>=</sup> *<sup>n</sup> j*=1 1 *<sup>n</sup>* = 1 and *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup>* <sup>=</sup> *<sup>n</sup> <sup>j</sup>*=1*(* <sup>1</sup> *<sup>n</sup>)*<sup>2</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>*. However, when *aj* <sup>=</sup> <sup>1</sup> *<sup>n</sup>, j* = 1*, . . . , n, U* = *X*¯ = 1 *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn)*. Hence we have the following corollary.

**Corollary 3.5.1.** *Let the Xj 's be Np(μ, Σ), Σ > O, j* = 1*,...,n. Then, the sample mean <sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +··· + *Xn) is distributed as a p-variate real Gaussian with the parameters μ and* <sup>1</sup> *<sup>n</sup>Σ, that is, <sup>X</sup>*¯ <sup>∼</sup> *Np(μ,* <sup>1</sup> *<sup>n</sup>Σ), Σ > O.*

From the representation given in Sect. 3.5.1, let **<sup>X</sup>** be the sample matrix, *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> + ···+ *Xn)*, the sample average, and the *<sup>p</sup>* <sup>×</sup> *<sup>n</sup>* matrix **<sup>X</sup>**¯ <sup>=</sup> *(X,... ,* ¯ *X),* ¯ **<sup>X</sup>** <sup>−</sup> **<sup>X</sup>**¯ <sup>=</sup> **<sup>X</sup>***(I* <sup>−</sup> <sup>1</sup> *<sup>n</sup>J J* - *)* = **X***A, J* - = *(*1*,...,* 1*)*. Since *A* is idempotent of rank *n* − 1, there exists an orthonormal matrix *P*, *P P*- = *I, P*- *P* = *I,* such that *P*- *AP* = diag*(*1*,...,* 1*,* 0*)* ≡ *D, A* = *PDP* and **X***A* = **X***PDP*- = **Z***DP*- . Note that *A* = *A*- *, A*<sup>2</sup> <sup>=</sup> *<sup>A</sup>* and *<sup>D</sup>*<sup>2</sup> <sup>=</sup> *<sup>D</sup>*. Thus, the sample sum of products matrix has the following representations:

$$\mathbf{S} = (\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})' = \mathbf{X}AA'\mathbf{X} = \mathbf{X}AX' = \mathbf{Z}DD'\mathbf{Z}' = \mathbf{Z}\_{n-1}\mathbf{Z}'\_{n-1} \tag{3.5.3}$$

where **Z***n*−<sup>1</sup> is a *p*×*(n*−1*)* matrix consisting of the first *n*−1 columns of **Z** = **X***P*. When

$$D = \begin{bmatrix} I\_{n-1} & O \\ O & 0 \end{bmatrix}, \ \mathbf{Z} = (\mathbf{Z}\_{n-1}, \ \mathbf{Z}\_{(n)}), \ \mathbf{Z}D\mathbf{Z}' = \mathbf{Z}\_{n-1}\mathbf{Z}'\_{n-1}, \ \mathbf{Z}$$

where *Z(n)* denotes the last column of **Z**. For a *p*-variate real normal population wherein the *Xj* 's are iid *Np(μ, Σ), Σ > O*, *j* = 1*, . . . , n, Xj* − *X*¯ = *(Xj* − *μ)* − *(X*¯ − *μ)* and hence the population can be taken to be distributed as *Np(O, Σ), Σ > O* without any loss of generality. Then the *n* − 1 columns of **Z***n*−<sup>1</sup> will be iid standard normal *Np(O, I )*. After discussing the real matrix-variate gamma distribution in the next chapter, we will show that whenever *(n* − 1*)* ≥ *p*, **Z***n*−1**Z**- *<sup>n</sup>*−<sup>1</sup> has a real matrix-variate gamma distribution, or equivalently, that it is Wishart distributed with *n* − 1 degrees of freedom.

#### **3.5a.1. Some simplifications of the sample matrix in the complex Gaussian case**

Let the *p* × 1 vector *X*˜ <sup>1</sup> in the complex domain have a complex Gaussian density *N*˜*p(μ, Σ), Σ > O* ˜ . Let *X*˜ <sup>1</sup>*,..., X*˜ *<sup>n</sup>* be iid as *X*˜ *<sup>j</sup>* ∼ *N*˜*p(μ, Σ), Σ > O* ˜ or **X**˜ = [*X*˜ <sup>1</sup>*,..., X*˜ *<sup>n</sup>*] is the sample matrix of a simple random sample of size *n* from <sup>a</sup> *<sup>N</sup>*˜*p(μ, Σ), Σ > O* ˜ . Let the sample mean vector or the sample average be ¯ *X*˜ = 1 *<sup>n</sup>(X*˜ <sup>1</sup> +···+ *<sup>X</sup>*˜ *n)* and the matrix of sample means be the bold-faced *<sup>p</sup>* <sup>×</sup> *<sup>n</sup>* matrix ¯ **X**˜ . Let *<sup>S</sup>*˜ <sup>=</sup> *(***X**˜ <sup>−</sup> ¯ **<sup>X</sup>**˜ *)(***X**˜ <sup>−</sup> ¯ **X**˜ *)*∗. Then ¯ **<sup>X</sup>**˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>***X**˜ *J J* - <sup>=</sup> **<sup>X</sup>**˜ *B,* **<sup>X</sup>**˜ <sup>−</sup> ¯ **<sup>X</sup>**˜ <sup>=</sup> **<sup>X</sup>**˜ *(I* <sup>−</sup> <sup>1</sup> *<sup>n</sup>J J* - *)* = **X**˜ *A*. Then, *<sup>A</sup>* <sup>=</sup> *<sup>A</sup>*2*, A* <sup>=</sup> *<sup>A</sup>*- = *A*∗*, B* = *B*- <sup>=</sup> *<sup>B</sup>*∗*, B* <sup>=</sup> *<sup>B</sup>*2*, AB* <sup>=</sup> *O, BA* <sup>=</sup> *<sup>O</sup>*. Thus, results parallel to Theorems 3.5.1 and 3.5.2 hold in the complex domain, and we now state the general result.

**Theorem 3.5a.1.** *Let the population be complex p-variate Gaussian N*˜*p(μ, Σ), Σ >* ˜ *O. Let the p* × *n sample matrix be* **X**˜ = *(X*˜ <sup>1</sup>*,..., X*˜ *n) where X*˜ <sup>1</sup>*,..., X*˜ *<sup>n</sup> are iid as <sup>N</sup>*˜*p(μ, Σ), Σ > O* ˜ *. Let* **<sup>X</sup>**˜ *,* ¯ *X,*˜ *S,*˜ **X**˜ *A,* **X**˜ *B be as defined above. Then,* **X**˜ *A and* **X**˜ *B are independently distributed, and thereby the sample mean* ¯ *X*˜ *and the sample sum of products matrix S*˜ *are independently distributed.*

#### **3.5a.2. Linear functions of the sample vectors in the complex domain**

Let *X*˜ *<sup>j</sup>* ∼ *N*˜*p(μ,*˜ *Σ),* ˜ *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O* be a *p*-variate complex Gaussian vector random variable. Consider a simple random sample of size *n* from this population, in which case the *X*˜ *<sup>j</sup>* 's, *j* = 1*, . . . , n,* are iid as *N*˜*p(μ,*˜ *Σ),* ˜ *Σ>O* ˜ . Let the linear function *U*˜ = *a*1*X*˜ <sup>1</sup> +···+*anX*˜ *<sup>n</sup>* where *a*1*,...,an* are real or complex scalar constants. Then, following through steps parallel to those provided in Sect. 3.5.2, we obtain the following mgf:

$$\tilde{M}\_{\tilde{U}}(\tilde{T}) = \mathbf{e}^{\Re(\tilde{T}^\*\tilde{\mu}(\sum\_{j=1}^n a\_j)) + \frac{1}{4}(\sum\_{j=1}^n a\_j a\_j^\*)\tilde{T}^\*\tilde{\Sigma}\tilde{T}}$$

where *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aja*<sup>∗</sup> *<sup>j</sup>* = |˜*a*1| <sup>2</sup> + ··· + |˜*an*<sup>|</sup> 2. For example, if *aj* <sup>=</sup> <sup>1</sup> *<sup>n</sup>, j* = 1*,...,n*, then *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj* <sup>=</sup> 1 and *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aja*<sup>∗</sup> *<sup>j</sup>* <sup>=</sup> <sup>1</sup> *<sup>n</sup>*. Hence, we have the following result and the resulting corollary.

**Theorem 3.5a.2.** *Let the p* × 1 *complex vector have a p-variate complex Gaussian distribution N*˜*p(μ,*˜ *Σ),* ˜ *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O. Consider a simple random sample of size n from this population, with the X*˜ *<sup>j</sup> 's, j* = 1*, . . . , n, being iid as this p-variate complex Gaussian. Let a*1*,...,an be scalar constants, real or complex. Consider the linear function <sup>U</sup>*˜ <sup>=</sup> *<sup>a</sup>*1*X*˜ <sup>1</sup> + ··· + *anX*˜ *<sup>n</sup>. Then <sup>U</sup>*˜ <sup>∼</sup> *<sup>N</sup>*˜*p(μ(*˜ *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj ), (<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aja*<sup>∗</sup> *<sup>j</sup> )Σ)*˜ *, that is, U*˜ *has a p-variate complex Gaussian distribution with the parameters ( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj )μ*˜ *and ( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aja*<sup>∗</sup> *<sup>j</sup> )Σ*˜ *.*

**Corollary 3.5a.1.** *Let the population and sample be as defined in Theorem 3.5a.2. Then the sample mean* ¯ *<sup>X</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*˜ <sup>1</sup> +···+ *X*˜ *n) is distributed as a p-variate complex Gaussian with the parameters <sup>μ</sup>*˜ *and* <sup>1</sup> *<sup>n</sup>Σ*˜ *.*

Proceeding as in the real case, we can show that the sample sum of products matrix *S*˜ can have a representation of the form

$$
\tilde{S} = \tilde{\mathbf{Z}}\_{n-1} \tilde{\mathbf{Z}}\_{n-1}^\* \tag{3.5a.3}
$$

where the columns of **Z**˜ *<sup>n</sup>*−<sup>1</sup> are iid standard normal vectors in the complex domain if the population is a *p*-variate Gaussian in the complex domain. In this case, it will be shown later, that *S*˜ is distributed as a complex Wishart matrix with *(n* − 1*)* ≥ *p* degrees of freedom.

#### **3.5.3. Maximum likelihood estimators of the** *p***-variate real Gaussian distribution**

Letting *L* denote the joint density of the sample values *X*1*,...,Xn*, which are *p* × 1 iid Gaussian vectors constituting a simple random sample of size *n*, we have

$$L = \prod\_{j=1}^{n} \frac{\mathbf{e}^{-\frac{1}{2}(X\_j - \mu)'\Sigma^{-1}(X\_j - \mu)}}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}\mathbf{S}) - \frac{1}{2}n(\bar{X} - \mu)'\Sigma^{-1}(\bar{X} - \mu)}}{(2\pi)^{\frac{np}{2}}|\Sigma|^{\frac{n}{2}}} \tag{3.5.4}$$

where, as previously denoted, **X** is the *p* × *n* matrix

$$\mathbf{X} = (X\_1, \dots, X\_n), \ \bar{X} = \frac{1}{n}(X\_1 + \dots + X\_n), \ \bar{\mathbf{X}} = (\bar{X}, \dots, \bar{X}),$$

$$\mathbf{S} = (\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})' = (s\_{ij}), \ s\_{ij} = \sum\_{k=1}^n (\mathbf{x}\_{ik} - \bar{\mathbf{x}}\_i)(\mathbf{x}\_{jk} - \bar{\mathbf{x}}\_j).$$

In this case, the parameters are the *p* × 1 vector *μ* and the *p* × *p* real positive definite matrix *Σ*. If we resort to Calculus to maximize *L*, then we would like to differentiate *L*, or the one-to-one function ln*L*, with respect to *μ* and *Σ* directly, rather than differentiating with respect to each element comprising *μ* and *Σ*. For achieving this, we need to further develop the differential operators introduced in Chap. 1.

**Definition 3.5.1.** *Derivative with respect to a matrix*. Let *Y* = *(yij )* be a *p* × *q* matrix where the elements *yij* 's are distinct real scalar variables. The operator *<sup>∂</sup> ∂Y* will be defined as *<sup>∂</sup> ∂Y* <sup>=</sup> *( <sup>∂</sup> ∂yij )* and this operator applied to a real scalar quantity *f* will be defined as

$$\frac{\partial}{\partial Y}f = \left(\frac{\partial f}{\partial y\_{lj}}\right).$$

For example, if *<sup>f</sup>* <sup>=</sup> *<sup>y</sup>*<sup>2</sup> <sup>11</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>12</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>13</sup> <sup>−</sup> *<sup>y</sup>*11*y*<sup>12</sup> <sup>+</sup> *<sup>y</sup>*<sup>21</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>22</sup> + *y*<sup>23</sup> and the 2 × 3 matrix *Y* is

$$\begin{aligned} Y &= \begin{bmatrix} \mathbf{y}\_{11} & \mathbf{y}\_{12} & \mathbf{y}\_{13} \\ \mathbf{y}\_{21} & \mathbf{y}\_{22} & \mathbf{y}\_{23} \end{bmatrix} \Rightarrow \frac{\partial f}{\partial Y} = \begin{bmatrix} \frac{\partial f}{\partial \mathbf{y}\_{11}} & \frac{\partial f}{\partial \mathbf{y}\_{12}} & \frac{\partial f}{\partial \mathbf{y}\_{13}} \\ \frac{\partial f}{\partial \mathbf{y}\_{21}} & \frac{\partial f}{\partial \mathbf{y}\_{22}} & \frac{\partial f}{\partial \mathbf{y}\_{23}} \end{bmatrix}, \\\ \frac{\partial f}{\partial Y} &= \begin{bmatrix} 2\mathbf{y}\_{11} - \mathbf{y}\_{12} & 2\mathbf{y}\_{12} - \mathbf{y}\_{11} & 2\mathbf{y}\_{13} \\ 1 & 2\mathbf{y}\_{22} & 1 \end{bmatrix}. \end{aligned}$$

There are numerous examples of real-valued scalar functions of matrix argument. The determinant and the trace are two scalar functions of a square matrix *A*. The derivative with respect to a vector has already been defined in Chap. 1. The loglikelihood function ln*L* which is available from (3.5.4) has to be differentiated with respect to *μ* and with respect to *Σ* and the resulting expressions have to be respectively equated to a null vector and a null matrix. These equations are then solved to obtain the critical points where the *L* as well as ln*L* may have a local maximum, a local minimum or a saddle point. However, ln*L* contains a determinant and a trace. Hence we need to develop some results on differentiating a determinant and a trace with respect to a matrix, and the following results will be helpful in this regard.

**Theorem 3.5.5.** *Let the p* × *p matrix Y* = *(yij ) be nonsingular, the yij 's being distinct real scalar variables. Let f* = |*Y* |*, the determinant of Y . Then,*

$$\frac{\partial}{\partial Y}|Y| = \begin{cases} |Y|(Y^{-1})' \text{ for a general } Y\\ |Y|[2Y^{-1} - \text{diag}(Y^{-1})] \text{ for } Y = Y' \end{cases}$$

*where* diag*(Y* <sup>−</sup>1*) is a diagonal matrix whose diagonal elements coincide with those of Y* <sup>−</sup>1*.*

*Proof***:** A determinant can be obtained by expansions along any row (or column), the resulting sums involving the corresponding elements and their associated cofactors. More specifically, |*Y* | = *yi*1*Ci*<sup>1</sup> +···+ *yipCip* for each *i* = 1*,... , p,* where *Cij* is the cofactor of *yij* . This expansion holds whether the elements in the matrix are real or complex. Then,

$$\frac{\partial}{\partial \mathbf{y}\_{ij}} |Y| = \begin{cases} C\_{ij} \text{ for a general } Y\\ 2C\_{ij} \text{ for } Y = Y', i \neq j\\ C\_{jj} \text{ for } Y = Y', i = j. \end{cases}$$

Thus, *<sup>∂</sup> ∂Y* <sup>|</sup>*<sup>Y</sup>* | = the matrix of cofactors = |*<sup>Y</sup>* <sup>|</sup>*(Y* <sup>−</sup>1*)* for a general *Y* . When *Y* = *Y* - , then

$$\frac{\partial}{\partial Y}|Y| = \begin{bmatrix} C\_{11} & 2C\_{12} & \cdots & 2C\_{1p} \\ 2C\_{21} & C\_{22} & \cdots & 2C\_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ 2C\_{p1} & 2C\_{p2} & \cdots & C\_{pp} \end{bmatrix}$$

$$=|Y|\,[2Y^{-1} - \text{diag}(Y^{-1})].$$

Hence the result.

**Theorem 3.5.6.** *Let A and Y* = *(yij ) be p* × *p matrices where A is a constant matrix and the yij 's are distinct real scalar variables. Then,*

$$\frac{\partial}{\partial Y}[\text{tr}(AY)] = \begin{cases} A' \text{ for a general } Y\\ A + A' - \text{diag}(A) \text{ for } Y = Y' \end{cases}$$

*Proof***:** tr*(AY )* = *ij aj iyij* for a general *<sup>Y</sup>* , so that *<sup>∂</sup> ∂Y* [tr*(Y )*] = *A* for a general *Y* . When *Y* = *Y* - , *<sup>∂</sup> ∂yjj* [tr*(AY )*] = *ajj* and *<sup>∂</sup> ∂yij* [tr*(AY )*] = *aij* + *aj i* for *i* = *j* . Hence, *∂ ∂Y* [tr*(AY )*] = *A* + *A*- − diag*(A)* for *Y* = *Y* - . Thus, the result is established.

*.*

With the help of Theorems 3.5.5 and 3.5.6, we can optimize *L* or ln*L* with *L* as specified in Eq. (3.5.4). For convenience, we take ln*L* which is given by

$$\ln L = -\frac{np}{2}\ln(2\pi) - \frac{n}{2}\ln|\Sigma| - \frac{1}{2}\text{tr}(\Sigma^{-1}S) - \frac{n}{2}(\bar{X} - \mu)'\Sigma^{-1}(\bar{X} - \mu). \tag{3.5.5}$$

Then,

$$\begin{aligned} \frac{\partial}{\partial \mu} \ln L &= O \Rightarrow 0 - \frac{n}{2} \frac{\partial}{\partial \mu} (\bar{X} - \mu)' \Sigma^{-1} (\bar{X} - \mu) = O, \\ &\Rightarrow n \, \Sigma^{-1} (\bar{X} - \mu) = O \Rightarrow \bar{X} - \mu = O, \\ &\Rightarrow \mu = \bar{X}, \end{aligned}$$

referring to the vector derivatives defined in Chap. 1. The extremal value, denoted with a hat, is *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>X</sup>*¯ . When differentiating with respect to *<sup>Σ</sup>*, we may take *<sup>B</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> for convenience and differentiate with respect to *B*. We may also substitute *X*¯ to *μ* because the critical point for *Σ* must correspond to *μ*ˆ = *X*¯ . Accordingly, ln*L* at *μ* = *X*¯ is

$$
\ln L(\hat{\mu}, B) = -\frac{np}{2}\ln(2\pi) + \frac{n}{2}\ln|B| - \frac{1}{2}\text{tr}(BS).
$$

Noting that *B* = *B*- ,

$$\begin{split} \frac{\partial}{\partial B} \ln L(\hat{\mu}, B) = O &\Rightarrow \frac{n}{2} [2B^{-1} - \text{diag}(B^{-1})] - \frac{1}{2} [2S - \text{diag}(S)] = O \\ &\Rightarrow n [2\Sigma - \text{diag}(\Sigma)] = 2S - \text{diag}(S) \\ &\Rightarrow \hat{\sigma}\_{ij} = \frac{1}{n} s\_{ij}, \ \hat{\sigma}\_{ij} = \frac{1}{n} s\_{ij}, i \neq j \\ &\Rightarrow (\hat{\mu} = \bar{X}, \ \hat{\Sigma} = \frac{1}{n} S). \end{split}$$

Hence, the only critical point is *(μ,*<sup>ˆ</sup> *Σ)*<sup>ˆ</sup> <sup>=</sup> *(X,* ¯ <sup>1</sup> *<sup>n</sup>S)*. Does this critical point correspond to a local maximum or a local minimum or something else? For *μ*ˆ = *X*¯ , consider the behavior of ln*L*. For convenience, we may convert the problem in terms of the eigenvalues of *B*. Letting *λ*1*,...,λp* be the eigenvalues of *B*, observe that *λj >* 0*, j* = 1*,...,p*, that the determinant is the product of the eigenvalues and the trace is the sum of the eigenvalues. Examining the behavior of ln*L* for all possible values of *λ*<sup>1</sup> when *λ*2*,...,λp* are fixed, we see that ln*L* at *μ*ˆ goes from −∞ to −∞ through finite values. For each *λj* , the behavior of ln*L* is the same. Hence the only critical point must correspond to a local maximum. Therefore *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>X</sup>*¯ and *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S* are the maximum likelihood estimators (MLE's) of *μ* and *Σ* respectively. The observed values of *μ*ˆ and *Σ*ˆ are the maximum likelihood estimates of *μ* and *Σ*, for which the same abbreviation MLE is utilized. While maximum likelihood estimators are random variables, maximum likelihood estimates are numerical values. Observe that, in order to have an estimate for *Σ*, we must have that the sample size *n* ≥ *p*.

In the derivation of the MLE of *<sup>Σ</sup>*, we have differentiated with respect to *<sup>B</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> instead of differentiating with respect to the parameter *Σ*. Could this affect final result? Given any *θ* and any non-trivial differentiable function of *θ*, *φ(θ )*, whose derivative is not identically zero, that is, <sup>d</sup> <sup>d</sup>*<sup>θ</sup> φ(θ )* = 0 for any *θ*, it follows from basic calculus that for any differentiable function *g(θ )*, the equations <sup>d</sup> <sup>d</sup>*<sup>θ</sup> g(θ )* <sup>=</sup> 0 and <sup>d</sup> <sup>d</sup>*<sup>φ</sup> g(θ )* = 0 will lead to the same solution for *<sup>θ</sup>*. Hence, whether we differentiate with respect to *<sup>B</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> or *Σ*, the procedures will lead to the same estimator of *Σ*. As well, if *θ*ˆ is the MLE of *θ*, then *g(θ )*ˆ will also the MLE of *g(θ )* whenever *g(θ )* is a one-to-one function of *θ*. The numerical evaluation of maximum likelihood estimates for *μ* and *Σ* has been illustrated in Example 3.5.1.

#### **3.5a.3. MLE's in the complex** *p***-variate Gaussian case**

Let the *p*×1 vectors in the complex domain *X*˜ <sup>1</sup>*,..., X*˜ *<sup>n</sup>* be iid as *N*˜*p(μ, Σ), Σ > O* ˜ . and let the joint density of the *X*˜ *<sup>j</sup>* 's, *j* = 1*, . . . , n,* be denoted by *L*˜. Then

$$\begin{split} \tilde{L} &= \prod\_{j=1}^{n} \frac{\mathbf{e}^{-(\tilde{X}\_{j}-\tilde{\mu})^{\*}\Sigma^{-1}(\tilde{X}\_{j}-\tilde{\mu})}}{\pi^{p}|\det(\Sigma)|} = \frac{\mathbf{e}^{-\sum\_{j=1}^{n}(\tilde{X}\_{j}-\tilde{\mu})^{\*}\Sigma^{-1}(\tilde{X}\_{j}-\tilde{\mu})}}{\pi^{np}|\det(\Sigma)|^{n}} \\ &= \frac{\mathbf{e}^{-\text{tr}(\Sigma^{-1}\tilde{S})-n(\bar{\tilde{X}}-\tilde{\mu})^{\*}\Sigma^{-1}(\tilde{\tilde{X}}-\tilde{\mu})}}{\pi^{np}|\det(\Sigma)|^{n}} \end{split}$$

where |det*(Σ)*| denotes the absolute value of the determinant of *Σ,*

$$\begin{aligned} \tilde{S} &= (\tilde{\mathbf{X}} - \bar{\tilde{\mathbf{X}}})(\tilde{\mathbf{X}} - \bar{\tilde{\mathbf{X}}})^\* = (\tilde{s}\_{ij}), \ \tilde{s}\_{ij} = \sum\_{k=1}^n (\tilde{\mathbf{x}}\_{ik} - \bar{\tilde{\mathbf{x}}}\_i)(\tilde{\mathbf{x}}\_{jk} - \bar{\tilde{\mathbf{x}}}\_j)^\*, \\ \tilde{\mathbf{X}} &= [\tilde{X}\_1, \dots, \tilde{X}\_n], \ \bar{\tilde{X}} = \frac{1}{n}(\tilde{X}\_1 + \dots + \tilde{X}\_n), \ \bar{\tilde{\mathbf{X}}} = [\bar{\tilde{X}}, \dots, \bar{\tilde{\mathbf{X}}}], \end{aligned}$$

where **X**˜ and ¯ **X**˜ are *p* × *n*. Hence,

$$\ln \tilde{L} = -np \ln \pi - n \ln |\det(\Sigma)| - \text{tr}(\Sigma^{-1} \tilde{S}) - n(\bar{\tilde{X}} - \tilde{\mu})^{\*} \Sigma^{-1} (\bar{\tilde{X}} - \tilde{\mu}).\tag{3.5a.4}$$

#### **3.5a.4. Matrix derivatives in the complex domain**

Consider tr*(B*˜ *<sup>S</sup>*˜∗*), <sup>B</sup>*˜ <sup>=</sup> *<sup>B</sup>*˜ <sup>∗</sup> *> O, <sup>S</sup>*˜ <sup>=</sup> *<sup>S</sup>*˜∗ *> O*. Let *<sup>B</sup>*˜ <sup>=</sup> *<sup>B</sup>*1+*iB*2*, <sup>S</sup>*˜ <sup>=</sup> *<sup>S</sup>*1+*iS*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*. Then *<sup>B</sup>*<sup>1</sup> and *<sup>S</sup>*<sup>1</sup> are real symmetric and *<sup>B</sup>*<sup>2</sup> and *<sup>S</sup>*<sup>2</sup> are real skew symmetric since *B*˜ and *S*˜ are Hermitian. What is then *<sup>∂</sup> ∂B*˜ [tr*(B*˜ *<sup>S</sup>*˜∗*)*]? Consider

$$\begin{aligned} \tilde{B}\tilde{S}^\* &= (B\_1 + iB\_2)(S\_1' - iS\_2') = B\_1S\_1' + B\_2S\_2' + i(B\_2S\_1' - B\_1S\_2'),\\ \text{tr}(\tilde{B}\tilde{S}^\*) &= \text{tr}(B\_1S\_1' + B\_2S\_2') + i[\text{tr}(B\_2S\_1') - \text{tr}(B\_1S\_2')].\end{aligned}$$

It can be shown that when *B*<sup>2</sup> and *S*<sup>2</sup> are real skew symmetric and *B*<sup>1</sup> and *S*<sup>1</sup> are real symmetric, then tr*(B*2*S*- <sup>1</sup>*)* = 0*,* tr*(B*1*S*- <sup>2</sup>*)* = 0. This will be stated as a lemma.

**Lemma 3.5a.1.** *Consider two p* × *p real matrices A and B where A* = *A*- *(symmetric) and B* = −*B*-*(skew symmetric). Then,* tr*(AB)* = 0*.*

*Proof***:** tr*(AB)* = tr*(AB)*- = tr*(B*- *A*- *)* = −tr*(BA)* = −tr*(AB)*, which implies that tr*(AB)* = 0.

Thus, tr*(B*˜ *S*˜∗*)* = tr*(B*1*S*- <sup>1</sup> + *B*2*S*- <sup>2</sup>*)*. The diagonal elements of *S*<sup>1</sup> in tr*(B*1*S*- <sup>1</sup>*)* are multiplied once by the diagonal elements of *B*<sup>1</sup> and the non-diagonal elements in *S*<sup>1</sup> are multiplied twice each by the corresponding elements in *B*1. Hence,

$$\frac{\partial}{\partial B\_{\rm l}} \text{tr}(B\_{\rm l} S\_{\rm l}') = 2S\_{\rm l} - \text{diag}(S\_{\rm l}).$$

In *B*<sup>2</sup> and *S*2*,* the diagonal elements are zeros and hence

$$\frac{\partial}{\partial B\_2} \text{tr}(B\_2 S\_2') = 2S\_2.$$

Therefore

$$\left(\frac{\partial}{\partial B\_1} + i \frac{\partial}{\partial B\_2}\right) \text{tr}(B\_1 S\_1' + B\_2 S\_2') = 2(S\_1 + iS\_2) - \text{diag}(S\_1) = 2\tilde{S} - \text{diag}(\tilde{S}) \dots$$

Thus, the following result:

**Theorem 3.5a.3.** *Let S*˜ = *S*˜∗ *> O and B*˜ = *B*˜ <sup>∗</sup> *> O be p* × *p Hermitian matrices. Let B*˜ = *B*<sup>1</sup> + *iB*<sup>2</sup> *and S*˜ = *S*<sup>1</sup> + *iS*<sup>2</sup> *where the p* × *p matrices B*<sup>1</sup> *and S*<sup>1</sup> *are symmetric and B*<sup>2</sup> *and S*<sup>2</sup> *are skew symmetric real matrices. Letting <sup>∂</sup> ∂B*˜ <sup>=</sup> *<sup>∂</sup> ∂B*<sup>1</sup> <sup>+</sup> *<sup>i</sup> <sup>∂</sup> ∂B*2 *, we have*

$$\frac{\partial}{\partial \tilde{B}} \text{tr}(\tilde{B}\tilde{S}^\*) = 2\tilde{S} - \text{diag}(\tilde{S}).$$

**Theorem 3.5a.4.** *Let Σ*˜ = *(σ*˜*ij )* = *Σ*˜ <sup>∗</sup> *> O be a Hermitian positive definite p* × *p matrix. Let* det*(Σ) be the determinant and* |det*(Σ)*| *be the absolute value of the determinant respectively. Let <sup>∂</sup> ∂Σ*˜ <sup>=</sup> *<sup>∂</sup> ∂Σ*<sup>1</sup> <sup>+</sup> *<sup>i</sup> <sup>∂</sup> ∂Σ*<sup>2</sup> *be the differential operator, where Σ*˜ = *Σ*<sup>1</sup> + *iΣ*2*, <sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*), <sup>Σ</sup>*<sup>1</sup> *being real symmetric and <sup>Σ</sup>*2*, real skew symmetric. Then,*

$$\frac{\partial}{\partial \tilde{\Sigma}} \ln |\det(\tilde{\Sigma})| = 2\tilde{\Sigma}^{-1} - \text{diag}(\tilde{\Sigma}^{-1}).$$

*Proof***:** Note that for two scalar complex quantities, *x*˜ = *x*<sup>1</sup> + *ix*<sup>2</sup> and *y*˜ = *y*<sup>1</sup> + *iy*<sup>2</sup> where *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)* and *<sup>x</sup>*1*, x*2*, y*1*, y*<sup>2</sup> are real, and for the operator *<sup>∂</sup> ∂x*˜ <sup>=</sup> *<sup>∂</sup> ∂x*<sup>1</sup> <sup>+</sup> *<sup>i</sup> <sup>∂</sup> ∂x*2 , the following results hold, which will be stated as a lemma.

**Lemma 3.5a.2.** *Given <sup>x</sup>*˜*, <sup>y</sup>*˜ *and the operator <sup>∂</sup> ∂x*<sup>1</sup> <sup>+</sup> *<sup>i</sup> <sup>∂</sup> ∂x*<sup>2</sup> *defined above,*

> *. .*

$$\begin{split} \frac{\partial}{\partial \tilde{\boldsymbol{x}}} (\tilde{\boldsymbol{x}} \tilde{\boldsymbol{y}}) &= 0, \ \frac{\partial}{\partial \tilde{\boldsymbol{x}}} (\tilde{\boldsymbol{x}} \tilde{\boldsymbol{y}}^{\*}) = 0, \ \frac{\partial}{\partial \tilde{\boldsymbol{x}}} (\tilde{\boldsymbol{x}}^{\*} \tilde{\boldsymbol{y}}) = 2 \tilde{\boldsymbol{y}}, \ \frac{\partial}{\partial \tilde{\boldsymbol{x}}} (\tilde{\boldsymbol{x}}^{\*} \tilde{\boldsymbol{y}}^{\*}) = 2 \tilde{\boldsymbol{y}}^{\*}, \\ \frac{\partial}{\partial \tilde{\boldsymbol{x}}} (\tilde{\boldsymbol{x}} \tilde{\boldsymbol{x}}^{\*}) &= \frac{\partial}{\partial \tilde{\boldsymbol{x}}} (\tilde{\boldsymbol{x}}^{\*} \tilde{\boldsymbol{x}}) = 2 \tilde{\boldsymbol{x}}, \ \frac{\partial}{\partial \tilde{\boldsymbol{x}}^{\*}} (\tilde{\boldsymbol{x}}^{\*} \tilde{\boldsymbol{x}}) = \frac{\partial}{\partial \tilde{\boldsymbol{x}}^{\*}} (\tilde{\boldsymbol{x}} \tilde{\boldsymbol{x}}^{\*}) = 2 \tilde{\boldsymbol{x}}^{\*} \end{split}$$

*where, for example, x*˜<sup>∗</sup> *which, in general, is the conjugate transpose of x,*˜ *is only the conjugate in this case since x*˜ *is a scalar quantity.*

Observe that for a *p*×*p* Hermitian positive definite matrix *X*˜ , the absolute value of the determinant, namely, <sup>|</sup>det*(X)*˜ | = det*(X)*˜ det*(X*˜ <sup>∗</sup>*)* = det*(X)*˜ = det*(X*˜ <sup>∗</sup>*)* since *X*˜ = *X*˜ <sup>∗</sup>. Consider the following cofactor expansion of det*(X)*˜ (in general, a cofactor expansion is valid whether the elements of the matrix are real or complex). Letting *Cij* denote the cofactor of *xij* in *X*˜ = *(xij )* when *xij* is real or complex,

$$\det(X) = \mathbf{x}\_{11}\mathbf{C}\_{11} + \mathbf{x}\_{12}\mathbf{C}\_{12} + \dots + \mathbf{x}\_{1p}\mathbf{C}\_{1p} \tag{I}$$

$$=\mathbf{x}\_{21}\mathbf{C}\_{21} + \mathbf{x}\_{22}\mathbf{C}\_{22} + \cdots + \mathbf{x}\_{2p}\mathbf{C}\_{2p} \tag{2}$$

$$\dot{\bar{\varepsilon}} = \bar{x}\_{p1}C\_{p1} + \bar{x}\_{p2}C\_{p2} + \dots + \bar{x}\_{pp}C\_{pp} \,. \tag{p}$$

When *X*˜ = *X*˜ <sup>∗</sup>, the diagonal elements *xjj* 's are all real. From Lemma 3.5a.2 and equation *(*1*)*, we have

$$\frac{\partial}{\partial \mathbf{x}\_{11}}(\mathbf{x}\_{11}C\_{11}) = C\_{11}, \ \frac{\partial}{\partial \mathbf{x}\_{1j}}(\mathbf{x}\_{1j}C\_{1j}) = 0, \ j = 2, \dots, p.$$

From Eq.*(*2*)*, note that *x*<sup>21</sup> = *x*<sup>∗</sup> <sup>12</sup>*, C*<sup>21</sup> = *C*<sup>∗</sup> <sup>12</sup> since *X*˜ = *X*˜ <sup>∗</sup>. Then from Lemma 3.5a.2 and *(*2*)*, we have

$$\frac{\partial}{\partial \mathbf{x}\_{12}} (\mathbf{x}\_{12}^\* C\_{12}^\*) = C\_{12}^\*, \ \frac{\partial}{\partial \mathbf{x}\_{22}} (\mathbf{x}\_{22}^\* C\_{22}^\*) = C\_{22}^\*, \ \frac{\partial}{\partial \mathbf{x}\_{2j}} (\mathbf{x}\_{2j} C\_{2j}) = 0, \ j = 3, \dots, p, 4$$

observing that *x*∗ <sup>22</sup> = *x*<sup>22</sup> and *C*<sup>∗</sup> <sup>22</sup> = *C*22. Now, continuing the process with Eqs.*(*3*), (*4*), . . . , (p),* we have the following result:

$$\frac{\partial}{\partial x\_{ij}}[\det(\tilde{X})] = \begin{cases} C\_{jj}^\*, & j = 1, \dots, p \\ 2C\_{ij}^\* \text{ for all } i \neq j. \end{cases}$$

Observe that for *<sup>Σ</sup>*˜ <sup>−</sup><sup>1</sup> <sup>=</sup> *<sup>B</sup>*˜ <sup>=</sup> *<sup>B</sup>*˜ <sup>∗</sup>,

$$\frac{\partial}{\partial \tilde{B}} [\ln(\det(\tilde{B}))] = \frac{1}{\det(\tilde{B})} \begin{bmatrix} \mathcal{B}\_{11}^{\*} & 2\mathcal{B}\_{12}^{\*} & \dots & \mathcal{B}\_{1p}^{\*} \\ 2\mathcal{B}\_{21}^{\*} & \mathcal{B}\_{22}^{\*} & \dots & 2\mathcal{B}\_{2p}^{\*} \\ \vdots & \vdots & \ddots & \vdots \\ 2\mathcal{B}\_{p1}^{\*} & 2\mathcal{B}\_{p2}^{\*} & \dots & \mathcal{B}\_{pp}^{\*} \end{bmatrix} $$

$$= 2\tilde{B}^{-1} - \text{diag}(\tilde{B}^{-1}) = 2\tilde{\Sigma} - \text{diag}(\tilde{\Sigma})^2$$

where *Brs* is the cofactor of *b*˜ *rs*, *B*˜ = *(b*˜ *rs)*. Therefore, at *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> ¯ *<sup>X</sup>*˜ , for *<sup>Σ</sup>*˜ <sup>−</sup><sup>1</sup> <sup>=</sup> *<sup>B</sup>*˜, and from Theorems 3.5a.5 and 3.5a.6, we have

$$\frac{\partial}{\partial \tilde{B}}[\ln \tilde{L}] = O \Rightarrow n[\tilde{\Sigma} - \text{diag}(\tilde{\Sigma})] - [\tilde{S} - \text{diag}(\tilde{S})] = O,$$

$$\Rightarrow \tilde{\Sigma} = \frac{1}{n}\tilde{S} \Rightarrow \dot{\tilde{\Sigma}} = \frac{1}{n}\tilde{S} \text{ for } n \ge p,$$

where a hat denotes the estimate/estimator.

Again, from Lemma 3.5a.2 we have the following:

$$\begin{split} \frac{\partial}{\partial \tilde{\mu}} [ (\bar{\tilde{X}} - \tilde{\mu})^\* \tilde{\Sigma}^{-1} (\bar{\tilde{X}} - \tilde{\mu}) ] &= \frac{\partial}{\partial \tilde{\mu}} \{ \bar{\tilde{X}}^\* \tilde{\Sigma}^{-1} \bar{\tilde{X}} + \tilde{\mu}^\* \tilde{\Sigma}^{-1} \tilde{\mu} - \bar{\tilde{X}}^\* \tilde{\Sigma}^{-1} \tilde{\mu} + \tilde{\mu}^\* \tilde{\Sigma}^{-1} \bar{\tilde{X}} \} = O, \\ &\Rightarrow O + 2 \tilde{\Sigma}^{-1} \tilde{\mu} - O - 2 \tilde{\Sigma}^{-1} \bar{\tilde{X}} = O \\ &\Rightarrow \hat{\tilde{\mu}} = \bar{\tilde{X}}. \end{split}$$

Thus, the MLE of *μ*˜ and *Σ*˜ are respectively ˆ *<sup>μ</sup>*˜ <sup>=</sup> ¯ *X*˜ and ˆ *<sup>Σ</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>S*˜ for *n* ≥ *p*. It is not difficult to show that the only critical point *(* ˆ *μ,* ˜ <sup>ˆ</sup> *Σ)*˜ <sup>=</sup> *(* ¯ *X,*˜ <sup>1</sup> *<sup>n</sup>S)*˜ corresponds to a local maximum for *L*˜. Consider ln*L*˜ at ˆ *<sup>μ</sup>*˜ <sup>=</sup> ¯ *<sup>X</sup>*˜ . Let *<sup>λ</sup>*1*,...,λp* be the eigenvalues of *<sup>B</sup>*˜ <sup>=</sup> *<sup>Σ</sup>*˜ <sup>−</sup><sup>1</sup> where the *λj* 's are real as *B*˜ is Hermitian. Examine the behavior of ln*L*˜ when a *λj* is increasing from 0 to ∞. Then ln*L*˜ goes from −∞ back to −∞ through finite values. Hence, the only critical point corresponds to a local maximum. Thus, ¯ *X*˜ and <sup>1</sup> *<sup>n</sup>S*˜ are the MLE's of *μ*˜ and *Σ ,* ˜ respectively.

**Theorems 3.5.7, 3.5a.5.** *For the p-variate real Gaussian with the parameters μ and Σ>O and the p-variate complex Gaussian with the parameters μ*˜ *and Σ>O* ˜ *, the maximum likelihood estimators (MLE's) are <sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> *X,* ¯ *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S,* ˆ *<sup>μ</sup>*˜ <sup>=</sup> ¯ *X,*˜ ˆ *<sup>Σ</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>S*˜ *where n is the sample size, X*¯ *and S are the sample mean and sample sum of products matrix in*

*the real case, and* ¯ *X*˜ *and S*˜ *are the sample mean and the sample sum of products matrix in the complex case, respectively.*

A numerical illustration of the maximum likelihood estimates of *μ*˜ and *Σ*˜ in the complex domain has already been given in Example 3.5a.1.

It can be shown that the MLE of *μ* and *Σ* in the real and complex *p*-variate Gaussian cases are such that *<sup>E</sup>*[*X*¯ ] = *μ, E*[*X*˜ ]= ˜*μ, E*[*S*] = *<sup>n</sup>*−<sup>1</sup> *<sup>n</sup> Σ, E*[*S*˜] = *<sup>n</sup>*−<sup>1</sup> *<sup>n</sup> Σ*˜ . For these results to hold, the population need not be Gaussian. Any population for which the covariance matrix exists will have these properties. This will be stated as a result.

**Theorems 3.5.8, 3.5a.6.** *Let X*1*,...,Xn be a simple random sample from any p-variate population with mean value vector μ and covariance matrix Σ* = *Σ*- *> O in the real case and mean value vector μ*˜ *and covariance matrix Σ*˜ = *Σ*˜ <sup>∗</sup> *> O in the complex case, respectively, and let Σ and Σ*˜ *exist in the sense all the elements therein exist. Let <sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn),* ¯ *<sup>X</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*˜ <sup>1</sup> +···+ *X*˜ *n) and let S and S*˜ *be the sample sum of products matrices in the real and complex cases, respectively. Then <sup>E</sup>*[*X*¯ ] = *μ, E*[ ¯ *X*˜ ] = *μ, E* ˜ [*Σ*<sup>ˆ</sup> ] = *<sup>E</sup>*[ <sup>1</sup> *<sup>n</sup>S*] = *<sup>n</sup>*−<sup>1</sup> *<sup>n</sup> <sup>Σ</sup>* <sup>→</sup> *<sup>Σ</sup> as <sup>n</sup>* → ∞ *and <sup>E</sup>*[ ¯ *<sup>X</sup>*˜ ]= ˜*μ, E*[ <sup>ˆ</sup> *<sup>Σ</sup>*˜ ] = *<sup>E</sup>*[ <sup>1</sup> *<sup>n</sup>S*˜] = *<sup>n</sup>*−<sup>1</sup> *<sup>n</sup> Σ*˜ → *Σ*˜ *as n* → ∞*.*

*Proof***:** *<sup>E</sup>*[*X*¯ ] = <sup>1</sup> *<sup>n</sup>*{*E*[*X*1]+···+ *<sup>E</sup>*[*Xn*]} = <sup>1</sup> *<sup>n</sup>*{*<sup>μ</sup>* +···+ *<sup>μ</sup>*} = *<sup>μ</sup>*. Similarly, *<sup>E</sup>*[ ¯ *X*˜ ]= ˜*μ*. Let **M** = *(μ, μ, . . . , μ)*, that is, *M* is a *p* × *n* matrix wherein every column is the *p* × 1 vector *μ*. Let **X**¯ = *(X,... ,* ¯ *X)*¯ , that is, **X**¯ is a *p* × *n* matrix wherein every column is *X*¯ . Now, consider

$$E[(\mathbf{X} - \mathbf{M})(\mathbf{X} - \mathbf{M})'] = E[\sum\_{j=1}^{n} (X\_j - \mu)(X\_j - \mu)'] = \sum\_{j=1}^{n} \{\Sigma + \dots + \Sigma\} = n\Sigma.\Sigma$$

As well,

$$\begin{aligned} (\mathbf{X} - \mathbf{M})(\mathbf{X} - \mathbf{M})' &= (\mathbf{X} - \bar{\mathbf{X}} + \bar{\mathbf{X}} - \mathbf{M})(\mathbf{X} - \bar{\mathbf{X}} + \bar{\mathbf{X}} - \mathbf{M})' \\ &= (\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})' + (\mathbf{X} - \bar{\mathbf{X}})(\bar{\mathbf{X}} - \mathbf{M})' \\ &+ (\bar{\mathbf{X}} - \mathbf{M})(\mathbf{X} - \bar{\mathbf{X}})' + (\bar{\mathbf{X}} - \mathbf{M})(\bar{\mathbf{X}} - \mathbf{M})' \Rightarrow \\ (\mathbf{X} - \mathbf{M})(\mathbf{X} - \mathbf{M})' &= S + \sum\_{j=1}^{n} (X\_j - \bar{X})(\bar{X} - \mu)' + \sum\_{j=1}^{n} (\bar{X} - \mu)(X\_j - \bar{X})' \\ &+ \sum\_{j=1}^{n} (\bar{X} - \mu)(\bar{X} - \mu)' \\ &= S + O + O + n(\bar{X} - \mu)(\bar{X} - \mu)' \Rightarrow \end{aligned}$$

$$n\Sigma = E[S] + O + O + n\text{Cov}(\bar{X}) = E[S] + n\left[\frac{1}{n}\Sigma\right] = E[S] + \Sigma \Rightarrow$$

$$E[S] = (n - 1)\Sigma \Rightarrow E[\hat{\Sigma}] = E\left[\frac{1}{n}S\right] = \frac{n - 1}{n}\Sigma \rightarrow \Sigma \text{ as } n \to \infty.$$

Observe that *<sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(Xj* <sup>−</sup> *X)*¯ <sup>=</sup> *<sup>O</sup>*, this result having been utilized twice in the above derivations. The complex case can be established in a similar manner. This completes the proof.

#### **3.5.4. Properties of maximum likelihood estimators**

**Definition 3.5.2 Unbiasedness.** Let *g(θ )* be a function of the parameter *θ* which stands for all the parameters associated with a population's distribution. Let the independently distributed random variables *x*1*,...,xn* constitute a simple random sample of size *n* from a univariate population. Let *T (x*1*,...,xn)* be an observable function of the sample values *x*1*,...,xn*. This definition for a statistic holds when the iid variables are scalar, vector or matrix variables, whether in the real or complex domains. Then *T* is called *a statistic* (the plural form, statistics, is not to be confused with the subject of Statistics). If *E*[*T* ] = *g(θ )* for all *θ* in the parameter space, then *T* is said to be unbiased for *g(θ )* or an unbiased estimator of *g(θ )*.

We will look at some properties of the MLE of the parameter or parameters represented by *θ* in a given population specified by its density/probability function *f (x, θ )*. Consider a simple random sample of size *n* from this population. The sample will be of the form *x*1*,...,xn* if the population is univariate or of the form *X*1*,...,Xn* if the population is multivariate or matrix-variate. Some properties of estimators in the scalar variable case will be illustrated first. Then the properties will be extended to the vector/matrix-variate cases. The joint density of the sample values will be denoted by *L*. Thus, in the univariate case,

$$L = L(\mathbf{x}\_1, \dots, \mathbf{x}\_n, \boldsymbol{\theta}) = \prod\_{j=1}^n f(\mathbf{x}\_j, \boldsymbol{\theta}) \Rightarrow \ln L = \sum\_{j=1}^n \ln f(\mathbf{x}\_j, \boldsymbol{\theta}).$$

Since the total probability is 1, we have the following, taking for example the variable to be continuous and a scalar parameter *θ*:

$$\int\_{X} L \,\mathrm{d}X = 1 \Rightarrow \frac{\partial}{\partial \theta} \int\_{X} L \,\mathrm{d}X = 0, \,\, X' = (x\_1, \dots, x\_n).$$

We are going to assume that the support of *x* is free of theta and the differentiation can be done inside the integral sign. Then,

$$0 = \int\_{X} \frac{\partial}{\partial \theta} L \,\mathrm{d}X = \int\_{X} \frac{1}{L} \left(\frac{\partial}{\partial \theta} L\right) L \,\mathrm{d}X = \int\_{X} \left[\frac{\partial}{\partial \theta} \ln L\right] L \,\mathrm{d}X.$$

Noting that # *<sup>X</sup>(*·*)L* d*X* = *E*[*(*·*)*], we have

$$E\left[\frac{\partial}{\partial\theta}\ln L\right] = 0 \Rightarrow E\left[\sum\_{j=1}^{n} \frac{\partial}{\partial\theta}\ln f(\mathbf{x}\_j, \theta)\right] = 0.\tag{3.5.6}$$

Let *θ*ˆ be the MLE of *θ*. Then

$$\begin{split} \frac{\partial}{\partial \theta} L|\_{\theta = \hat{\theta}} = 0 &\Rightarrow \frac{\partial}{\partial \theta} \ln L|\_{\theta = \hat{\theta}} = 0 \\ &\Rightarrow E \left[ \sum\_{j=1}^{n} \frac{\partial}{\partial \theta} \ln f(\mathbf{x}\_{j}, \theta)|\_{\theta = \hat{\theta}} \right] = 0. \end{split} \tag{3.5.7}$$

If *θ* is scalar, then the above are single equations, otherwise they represent a system of equations as the derivatives are then vector or matrix derivatives. Here (3.5.6) is the likelihood equation giving rise to the maximum likelihood estimators (MLE) of *θ*. However, by the weak law of large numbers (see Sect. 2.6),

$$\frac{1}{n}\sum\_{j=1}^{n}\frac{\partial}{\partial\theta}\ln f(\mathbf{x}\_{j},\theta)|\_{\theta=\hat{\theta}}\to E[\frac{\partial}{\partial\theta}\ln f(\mathbf{x}\_{j},\theta\_{o})] \text{ as } n\to\infty \tag{3.5.8}$$

where *θo* is the true value of *<sup>θ</sup>*. Noting that *<sup>E</sup>*[ *<sup>∂</sup> ∂θ* ln *f (xj , θo)*] = 0 owing to the fact that # <sup>∞</sup> −∞ *f (x)*d*<sup>x</sup>* <sup>=</sup> 1, we have the following results:

$$\sum\_{j=1}^{n} \frac{\partial}{\partial \theta} \ln f(\mathbf{x}\_j, \theta)|\_{\theta = \hat{\theta}} = 0, \; E\left[\frac{\partial}{\partial \theta} \ln f(\mathbf{x}\_j, \theta)|\_{\theta = \theta\_o}\right] = 0.$$

This means that *E*[*θ*ˆ] = *θ*<sup>0</sup> or *E*[*θ*ˆ] → *θo* as *n* → ∞, that is, *θ*ˆ is asymptotically unbiased for the true value *θo* of *θ*. As well, *θ*ˆ → *θo* as *n* → ∞ almost surely or with probability 1, except on a set having probability measure zero. Thus, the MLE of *θ* is asymptotically unbiased and consistent for the true value *θo*, which is stated next as a theorem:

**Theorem 3.5.9.** *In a given population's distribution whose parameter or set of parameters is denoted by θ, the MLE of θ, denoted by θ*ˆ*, is asymptotically unbiased and consistent for the true value θo.*

**Definition 3.5.3. Consistency of an estimator** If *P r*{*θ*ˆ → *θo*} → 1 as *n* → ∞, then we say that *θ*ˆ is consistent for *θo*, where *θ*ˆ is an estimator for *θ*.

**Example 3.5.2.** Consider a real *p*-variate Gaussian population *Np(μ, Σ), Σ > O*. Show that the MLE of *μ* is unbiased and consistent for *μ* and that the MLE of *Σ* is asymptotically unbiased for *Σ*.

**Solution 3.5.2.** We have *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>X</sup>*¯ <sup>=</sup> the sample mean or sample average and *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *nS* where *S* is the sample sum of products matrix. From Theorem 3.5.4, *E*[*X*¯ ] = *μ* and Cov*(X)*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>Σ* → *O* as *n* → ∞. Therefore, *μ*ˆ = *X*¯ is unbiased and consistent for *μ*. From Theorem 3.5.8, *<sup>E</sup>*[*Σ*<sup>ˆ</sup> ] = *<sup>n</sup>*−<sup>1</sup> *<sup>n</sup> Σ* → *Σ* as *n* → ∞ and hence *Σ*ˆ is asymptotically unbiased for *Σ*.

Another desirable property for point estimators is referred to as sufficiency. If *T* is a statistic used to estimate a real scalar, vector or matrix parameter *θ* and if the conditional distribution of the sample values, given this statistic *T* , is free of *θ*, then no more information about *θ* can be secured from that sample once the statistic *T* is known. Accordingly, all the information that can be obtained from the sample is contained in *T* or, in this sense, *T* is *sufficient* or a *sufficient estimator* for *θ*.

**Definition 3.5.4. Sufficiency of estimators** Let *θ* be a scalar, vector or matrix parameter associated with a given population's distribution. Let *T* = *T (X*1*,...,Xn)* be an estimator of *θ*, where *X*1*,...,Xn* are iid as the given population. If the conditional distribution of the sample values *X*1*,...,Xn*, given *T* , is free of *θ*, then we say that this *T* is a *sufficient estimator* for *θ*. If there are several scalar, vector or matrix parameters *θ*1*,...,θk* associated with a given population and if *T*1*(X*1*,...,Xn), . . . , Tr(X*1*,...,Xn)* are *r* statistics, where *r* may be greater, smaller or equal to *k*, then if the conditional distribution of *X*1*,...,Xn*, given *T*1*,...,Tr*, is free of *θ*1*,...,θk,* then we say that *T*1*,...,Tr* are *jointly sufficient* for *θ*1*,...,θk*. If there are several sets of statistics, where each set is sufficient for *θ*1*,...,θk*, then that set of statistics which allows for the maximal reduction of the data is called the *minimal sufficient set of statistics* for *θ*1*,...,θk*.

**Example 3.5.3.** Show that the MLE of *μ* in a *Np(μ, Σ), Σ > O*, is sufficient for *μ*.

**Solution 3.5.3.** Let *X*1*,...,Xn* be a simple random sample from a *Np(μ, Σ)*. Then the joint density of *X*1*,...,Xn* can be written as

$$L = \frac{1}{(2\pi)^{\frac{np}{2}}|\Sigma|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}\mathbf{S}) - \frac{n}{2}(\bar{X} - \mu)'\Sigma^{-1}(\bar{X} - \mu)},\tag{i}$$

referring to (3.5.2). Since*X*¯ is a function of *X*1*,...,Xn*, the joint density of *X*1*,...,Xn* and *X*¯ is *L* itself. Hence, the conditional density of *X*1*,...,Xn*, given *X*¯ , is *L/f*1*(X)*¯ where *f*1*(X)*¯ is the marginal density of *X*¯ . However, appealing to Corollary 3.5.1, *f*1*(X)*¯ is *Np(μ,* <sup>1</sup> *<sup>n</sup>Σ)*. Hence

$$\frac{L}{f\_1(\bar{\tilde{X}})} = \frac{1}{(2\pi)^{\frac{n(p-1)}{2}} n^{\frac{p}{2}} |\Sigma|^{\frac{n-1}{2}}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\Sigma^{-1} \mathbf{S})},\tag{i\mathcal{i}}$$

which is free of *μ* so that *μ*ˆ is sufficient for *μ*.

**Note 3.5.1.** We can also show that *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>X</sup>*¯ and *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S* are jointly sufficient for *μ* and *Σ* in a *Np(μ, Σ), Σ > O,* population. This results requires the density of *S*, which will be discussed in Chap. 5.

An additional property of interest for a point estimator is that of *relative efficiency*. If *g(θ )* is a function of *θ* and if *T* = *T (x*1*,...,xn)* is an estimator of *g(θ )*, then *E*|*T* −*g(θ )*| 2 is a squared mathematical distance between *T* and *g(θ )*. We can consider the following criterion: *the smaller the distance, the more efficient the estimator is,* as we would like this distance to be as small as possible when we are estimating *g(θ )* by making use of *T* . If *E*[*T* ] = *g(θ )*, then *T* is unbiased for *g(θ )* and, in this case, *E*|*T* − *g(θ )*| <sup>2</sup> <sup>=</sup> Var*(T )*, the variance of *T* . In the class of unbiased estimators, we seek that particular estimator which has the smallest variance.

**Definition 3.5.5. Relative efficiency of estimators** If *T*<sup>1</sup> and *T*<sup>2</sup> are two estimators of the same function *g(θ )* of *θ* and if *E*[|*T*<sup>1</sup> − *g(θ )*| <sup>2</sup>] *< E*[|*T*<sup>2</sup> <sup>−</sup> *g(θ )*<sup>|</sup> <sup>2</sup>], then *<sup>T</sup>*<sup>1</sup> is said to be relatively more efficient for estimating *g(θ )*. If *T*<sup>1</sup> and *T*<sup>2</sup> are unbiased for *g(θ )*, the criterion becomes Var*(T*1*) <* Var*(T*2*)*.

Let *u* be an unbiased estimator of *g(θ )*, a function of the parameter *θ* associated with any population, and let *T* be a sufficient statistic for *θ*. Let the conditional expectation of *u*, given *T* , be denoted by *h(T )*, that is, *E*[*u*|*T* ] ≡ *h(T )*. We have the two following general properties on conditional expectations, refer to Mathai and Haubold (2017), for example. For any two real scalar random variables *x* and *y* having a joint density/probability function,

$$E[\mathbf{y}] = E[E(\mathbf{y}|\mathbf{x})] \tag{3.5.9}$$

and

$$\text{Var}(\mathbf{y}) = \text{Var}(E[\mathbf{y}|\mathbf{x}]) + E[\text{Var}(\mathbf{y}|\mathbf{x})] \tag{3.5.10}$$

whenever the expected values exist. From (3.5.9),

$$\mathbf{g}(\theta) = E[u] = E[E(u|T)] = E[h(T)] \Rightarrow E[h(T)] = \mathbf{g}(\theta). \tag{3.5.11}$$

Then,

$$\text{Var}(u) = E[u - g(\theta)]^2 = \text{Var}(E[u|T]) + E[\text{Var}(E[u|T])] = \text{Var}(h(T)) + \delta, \ \delta \ge 0$$

$$\Rightarrow \text{Var}(u) \ge \text{Var}(h(T)), \tag{3.5.12}$$

which means that if we have a sufficient statistic *T* for *θ*, then the variance of *h(T )*, with *h(T )* = *E*[*u*|*T* ] where *u* is any unbiased estimator of *g(θ )*, is smaller than or equal to the variance of any unbiased estimator of *g(θ )*. Accordingly, we should restrict ourselves to the class of *h(T )* when seeking minimum variance estimators. Observe that since *δ* in (3.5.12) is the expected value of the variance of a real variable, it is nonnegative. The inequality in (3.5.12) is known in the literature as the *Rao-Blackwell Theorem*.

It follows from (3.5.6) that *<sup>E</sup>*[ *<sup>∂</sup> ∂θ* ln*L*] = # *<sup>X</sup>( <sup>∂</sup> ∂θ* ln*L)L* d*X* = 0. Differentiating once again with respect to *θ*, we have

$$\begin{split} 0 &= \int\_{X} \frac{\partial}{\partial \theta} \Big[ \left( \frac{\partial}{\partial \theta} \ln L \right) L \Big] \mathrm{d}X = 0 \\ &\Rightarrow \int\_{X} \Big[ \left( \frac{\partial^{2}}{\partial \theta^{2}} \ln L \right) L + \left( \frac{\partial}{\partial \theta} \ln L \right)^{2} \Big] \mathrm{d}X = 0 \\ &\Rightarrow \int\_{X} \left( \frac{\partial}{\partial \theta} \ln L \right)^{2} L \, \mathrm{d}X = - \int\_{X} \left( \frac{\partial^{2}}{\partial \theta^{2}} \ln L \right) L \, \mathrm{d}X, \end{split}$$

so that

$$\begin{split} \text{Var}\left(\frac{\partial}{\partial\theta}\ln L\right) &= E\left[\frac{\partial}{\partial\theta}\ln L\right]^2 = -E\left[\frac{\partial^2}{\partial\theta^2}\ln L\right] \\ &= nE\left[\frac{\partial}{\partial\theta}\ln f(\mathbf{x}\_j,\theta)\right]^2 = -nE\left[\frac{\partial^2}{\partial\theta^2}\ln f(\mathbf{x}\_j,\theta)\right]. \end{split} \tag{3.5.13}$$

Let *T* be any estimator for *θ*, where *θ* is a real scalar parameter. If *T* is unbiased for *θ*, then *E*[*T* ] = *θ*; otherwise, let *E*[*T* ] = *θ* + *b(θ )* where *b(θ )* is some function of *θ*, which is called the bias. Then, differentiating both sides with respect to *θ*,

$$\begin{aligned} \int\_X TL \, \mathrm{d}X &= \theta + b(\theta) \Rightarrow \\ 1 + b'(\theta) &= \int\_X T \frac{\partial}{\partial \theta} L \, \mathrm{d}X, \quad b'(\theta) = \frac{\mathrm{d}}{\mathrm{d}\theta} b(\theta) \\ &\Rightarrow E[T(\frac{\partial}{\partial \theta} \ln L)] = 1 + b'(\theta) \\ &= \mathrm{Cov}(T, \frac{\partial}{\partial \theta} \ln L) \end{aligned}$$

because *<sup>E</sup>*[ *<sup>∂</sup> ∂θ* ln*L*] = 0. Hence,

$$\begin{split} [\text{Cov}(T, \frac{\partial}{\partial \theta} \ln L)]^2 &= [1 + b'(\theta)]^2 \le \text{Var}(T) \text{Var}(\frac{\partial}{\partial \theta} \ln L) \Rightarrow \\ \text{Var}(T) &\ge \frac{[1 + b'(\theta)]^2}{\text{Var}(\frac{\partial}{\partial \theta} \ln L)} = \frac{[1 + b'(\theta)]^2}{n \text{Var}(\frac{\partial}{\partial \theta} \ln f(x\_j, \theta))} \\ &= \frac{[1 + b'(\theta)]^2}{E[\frac{\partial}{\partial \theta} \ln L]^2} = \frac{[1 + b'(\theta)]^2}{nE[\frac{\partial}{\partial \theta} \ln f(x\_j, \theta)]^2}, \end{split} \tag{3.5.14}$$

which is a lower bound for the variance of any estimator for *θ*. This inequality is known as the *Cramer-Rao inequality ´* in the literature. When *T* is unbiased for *θ*, then *b*- *(θ )* = 0 and then

$$\text{Var}(T) \ge \frac{1}{I\_{\text{fl}}(\theta)} = \frac{1}{nI\_{\text{l}}(\theta)}\tag{3.5.15}$$

where

$$I\_n(\theta) = \text{Var}(\frac{\partial}{\partial \theta} \ln L) = E\left[\frac{\partial}{\partial \theta} \ln L\right]^2 = nE\left[\frac{\partial}{\partial \theta} \ln f(\mathbf{x}\_j, \theta)\right]^2$$

$$= -E\left[\frac{\partial^2}{\partial \theta^2} \ln L\right] = -nE\left[\frac{\partial^2}{\partial \theta^2} \ln f(\mathbf{x}\_j, \theta)\right] = nI\_1(\theta) \tag{3.5.16}$$

is known as *Fisher's information* about *θ* which can be obtained from a sample of size *n*, *I*1*(θ )* being *Fisher's information* in one observation or a sample of size 1. Observe that Fisher's information is different from the *information* in *Information Theory*. For instance, some aspects of Information Theory are discussed in Mathai and Rathie (1975).

#### **Asymptotic efficiency and normality of MLE's**

We have already established that

$$0 = \frac{\partial}{\partial \theta} \ln L(X, \theta)|\_{\theta = \hat{\theta}}\,, \tag{i}$$

which is the likelihood equation giving rise to the MLE. Let us expand *(i)* in a neighborhood of the true parameter value *θo* :

$$0 = \frac{\partial}{\partial \theta} \ln L(X, \theta)|\_{\theta = \theta\_o} + (\hat{\theta} - \theta\_o) \frac{\partial^2}{\partial \theta^2} \ln L(X, \theta)|\_{\theta = \theta\_o}$$

$$+ \frac{(\hat{\theta} - \theta\_o)^2}{2} \frac{\partial^3}{\partial \theta^3} \ln L(X, \theta)|\_{\theta = \theta\_1} \tag{ii}$$

where <sup>|</sup>*θ*<sup>ˆ</sup> <sup>−</sup> *<sup>θ</sup>*1<sup>|</sup> *<sup>&</sup>lt;* <sup>|</sup>*θ*<sup>ˆ</sup> <sup>−</sup> *θo*|. Multiplying both sides by <sup>√</sup>*<sup>n</sup>* and rearranging terms, we have the following:

$$\sqrt{n}(\hat{\theta}-\theta\_{0}) = \frac{-\frac{1}{\sqrt{n}}\frac{\partial}{\partial\theta}\ln L(X,\theta)|\_{\theta=\theta\_{0}}}{\frac{1}{n}\frac{\partial^{2}}{\partial\theta^{2}}\ln(X,\theta)|\_{\theta=\theta\_{0}} + \frac{1}{n}\frac{(\hat{\theta}-\theta\_{0})}{2}\frac{\partial^{3}}{\partial\theta^{3}}\ln L(X,\theta)|\_{\theta=\theta\_{0}}}.\tag{iiii}$$

The second term in the denominator of *(iii)* goes to zero because *θ*ˆ → *θo* as *n* → ∞, and the third derivative is assumed to be bounded. Then the first term in the denominator is such that

$$\begin{split} \frac{1}{n} \frac{\partial^2}{\partial \theta^2} \ln L(X,\theta)|\_{\theta = \theta\_o} &= \frac{1}{n} \sum\_{j=1}^n \frac{\partial^2}{\partial \theta^2} \ln f(\mathbf{x}\_j, \theta)|\_{\theta = \theta\_o} \\ &\to E\left[\frac{\partial^2}{\partial \theta^2} \ln f(\mathbf{x}\_j, \theta)\right] = -I\_1(\theta)|\_{\theta\_o}, \\ &= -\text{Var}\left[\frac{\partial}{\partial \theta} \ln f(\mathbf{x}\_j, \theta)\right]\Big|\_{\theta = \theta\_o}, \\ I\_1(\theta)|\_{\theta = \theta\_o} &= \text{Var}\left[\frac{\partial}{\partial \theta} \ln f(\mathbf{x}\_j, \theta)\right]\Big|\_{\theta = \theta\_o}, \end{split}$$

which is the information bound *I*1*(θo)*. Thus,

$$\frac{1}{n}\frac{\partial^2}{\partial\theta^2}\ln L(X,\theta)|\_{\theta=\theta\_o}\to -I\_1(\theta\_o),\tag{i\nu}$$

and we may write *(iii)* as follows:

$$
\sqrt{I\_1(\theta\_o)}\sqrt{n}(\hat{\theta}-\theta\_o) \approx \frac{\sqrt{n}}{\sqrt{I\_1(\theta\_o)}}\frac{1}{n}\sum\_{j=1}^n \frac{\partial}{\partial \theta} \ln f(\mathbf{x}\_j, \theta)|\_{\theta = \theta\_o},\tag{\text{v}}
$$

where *<sup>∂</sup> ∂θ* ln *f (xj ,θ)* has zero as its expected value and *I*1*(θo)* as its variance. Further, *f (xj ,θ)*, *j* = 1*,...,n* are iid variables. Hence, by the central limit theorem which is stated in Sect. 2.6,

$$\frac{\sqrt{n}}{\sqrt{I(\theta\_o)}} \frac{1}{n} \sum\_{j=1}^{n} \frac{\partial}{\partial \theta} \ln f(\mathbf{x}\_j, \theta) \to N\_\mathbf{l}(0, 1) \text{ as } n \to \infty. \tag{3.5.17}$$

where *N*1*(*0*,* 1*)* is a univariate standard normal random variable. This may also be reexpressed as follows since *I*1*(θo)* is free of *n*:

$$\frac{1}{\sqrt{n}}\sum\_{j=1}^{n}\frac{\partial}{\partial\theta}\ln f(\mathbf{x}\_{j},\theta)|\_{\theta=\theta\_{o}} \to N\_{1}(0, I\_{1}(\theta\_{o})) $$

The Multivariate Gaussian and Related Distributions

or

$$
\sqrt{I\_{\rm l}(\theta\_o)}\sqrt{n}(\hat{\theta} - \theta\_o) \to N\_{\rm l}(0, 1) \text{ as } n \to \infty. \tag{3.5.18}
$$

Since *I*1*(θo)* is free of *n*, this result can also be written as

$$
\sqrt{n}(\hat{\theta} - \theta\_o) \to N\_1(0, \frac{1}{I\_1(\theta\_o)}).\tag{3.5.19}
$$

Thus, the MLE *θ*ˆ is asymptotically unbiased, consistent and asymptotically normal, referring to (3.5.18) or (3.5.19).

**Example 3.5.4.** Show that the MLE of the parameter *θ* in a real scalar exponential population is unbiased, consistent, efficient and that asymptotic normality holds as in (3.5.18).

**Solution 3.5.4.** As per the notations introduced in this section,

$$\begin{aligned} f(\mathbf{x}\_j, \boldsymbol{\theta}) &= \frac{1}{\boldsymbol{\theta}} \mathbf{e}^{-\frac{\boldsymbol{x}\_j}{\boldsymbol{\theta}}}, \ 0 \le \mathbf{x}\_j < \infty, \ \boldsymbol{\theta} > \mathbf{0}, \\\ L &= \frac{1}{\boldsymbol{\theta}^n} \mathbf{e}^{-\frac{1}{\boldsymbol{\theta}} \sum\_{j=1}^n \boldsymbol{x}\_j}. \end{aligned}$$

In the exponential population, *<sup>E</sup>*[*xj* ] = *θ ,* Var*(xj )* <sup>=</sup> *<sup>θ</sup>* <sup>2</sup>*, j* <sup>=</sup> <sup>1</sup>*,...,n*, the MLE of *<sup>θ</sup>* is *<sup>θ</sup>*<sup>ˆ</sup> = ¯*x, <sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(x*<sup>1</sup> +···+ *xn)* and Var*(θ )*<sup>ˆ</sup> <sup>=</sup> *<sup>θ</sup>* <sup>2</sup> *<sup>n</sup>* → 0 as *n* → ∞. Thus, *E*[*θ*ˆ] = *θ* and Var*(θ )*ˆ → 0 as *n* → ∞. Hence, *θ*ˆ is unbiased and consistent for *θ*. Note that

$$\ln f(\mathbf{x}\_j, \boldsymbol{\theta}) = -\ln \boldsymbol{\theta} - \frac{1}{\boldsymbol{\theta}} \mathbf{x}\_j \Rightarrow -E\left[\frac{\partial^2}{\partial \boldsymbol{\theta}^2} f(\mathbf{x}\_j, \boldsymbol{\theta})\right] = -\frac{1}{\theta^2} + 2\frac{E[\mathbf{x}\_j]}{\theta^3} = \frac{1}{\theta^2} = \frac{1}{\text{Var}(\mathbf{x}\_j)}.$$

Accordingly, the information bound is attained, that is, *θ*ˆ is minimum variance unbiased or most efficient. Letting the true value of *θ* be *θo*, by the central limit theorem, we have

$$\frac{\bar{\boldsymbol{x}} - \theta\_o}{\sqrt{\text{Var}(\bar{\boldsymbol{x}})}} = \frac{\sqrt{n}(\hat{\theta} - \theta\_o)}{\theta\_o} \to N\_1(0, 1) \text{ as } n \to \infty,$$

and hence the asymptotic normality is also verified. Is *θ*ˆ sufficient for *θ*? Let us consider the statistic *u* = *x*<sup>1</sup> +··· + *xn*, the sample sum. If *u* is sufficient, then *x*¯ = *θ*ˆ is also sufficient. The mgf of *u* is given by

$$M\_{\boldsymbol{\theta}}(t) = \prod\_{j=1}^{n} (1 - \theta t)^{-1} = (1 - \theta t)^{-n}, \ 1 - \theta t > 0 \Rightarrow \boldsymbol{\mu} \sim \text{gamma} \\ \text{ma}(\boldsymbol{\alpha} = \boldsymbol{n}, \boldsymbol{\beta} = \boldsymbol{\theta})$$

whose density is *<sup>f</sup>*1*(u)* <sup>=</sup> *<sup>u</sup>n*−<sup>1</sup> *θnΓ (n)*e−*<sup>u</sup> <sup>θ</sup> , u* = *x*<sup>1</sup> + ··· + *xn*. However, the joint density of *<sup>x</sup>*1*,...,xn* is *<sup>L</sup>* <sup>=</sup> <sup>1</sup> *<sup>θ</sup><sup>n</sup>* <sup>e</sup>−<sup>1</sup> *<sup>θ</sup> (x*1+···+*xn) .* Accordingly, the conditional density of *x*1*,...,xn* given *θ*ˆ = ¯*x* is

$$\frac{L}{f\_1(u)} = \frac{\Gamma(n)}{u^{n-1}}\ ,$$

which is free of *θ*, and hence *θ*ˆ is also sufficient.

#### **3.5.5. Some limiting properties in the** *p***-variate case**

The *p*-variate extension of the central limit theorem is now being considered. Let the *p* × 1 real vectors *X*1*,...,Xn* be iid with common mean value vector *μ* and the common covariance matrix *Σ>O*, that is, *E(Xj )* = *μ* and Cov*(Xj )* = *Σ > O, j* = 1*,...,n*. Assume that *Σ <sup>&</sup>lt;* <sup>∞</sup> where *(*·*)* denotes a norm of *(*·*)*. Letting *Yj* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>*Xj* , *E*[*Yj* ] = *Σ*−<sup>1</sup> <sup>2</sup>*<sup>μ</sup>* and Cov*(Yj )* <sup>=</sup> *I, j* <sup>=</sup> <sup>1</sup>*,...,n*, and letting *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn)*, *<sup>Y</sup>*¯ <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>*X*¯ , *E(X)*¯ <sup>=</sup> *<sup>μ</sup>* and *E(Y )*¯ <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>*μ*. If we let

$$U = \sqrt{n} \,\,\Sigma^{-\frac{1}{2}} (\bar{X} - \mu),\tag{3.5.20}$$

the following result holds:

**Theorem 3.5.10.** *Let the p* × 1 *vector U be as defined in (3.5.20). Then, as n* → ∞*, U* → *Np(O, I ).*

*Proof***:** Let *L*- = *(a*1*,...,ap)* be an arbitrary constant vector such that *L*- *L* = 1. Then, *L*- *Xj , j* = 1*, . . . , n,* are iid with common mean *L*- *μ* and common variance Var*(L*- *Xj )* = *L*- *ΣL*. Let *Yj* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>*Xj* and *uj* = *L*- *Yj* = *L*- *Σ*−<sup>1</sup> <sup>2</sup>*Xj* . Then, the common mean of the *uj* 's is *L*- *Σ*−<sup>1</sup> <sup>2</sup>*μ* and their common variance is Var*(uj )* = *L*- *Σ*−<sup>1</sup> 2*ΣΣ*−<sup>1</sup> <sup>2</sup>*L* = *L*- *L* = <sup>1</sup>*, j* <sup>=</sup> <sup>1</sup>*,...,n*. Note that *<sup>u</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(u*<sup>1</sup> +···+ *un)* = *L*- *Y*¯ = *L*- *Σ*−<sup>1</sup> <sup>2</sup>*X*¯ and that Var*(u)*¯ <sup>=</sup> <sup>1</sup> *nL*- *<sup>L</sup>* <sup>=</sup> <sup>1</sup> *<sup>n</sup>*. Then, in light of the univariate central limit theorem as stated in Sect. 2.6, we have <sup>√</sup>*nL*- *Σ*−<sup>1</sup> <sup>2</sup> *(X*¯ − *μ)* → *N*1*(*0*,* 1*)* as *n* → ∞. If, for some *p*-variate vector *W*, *L*- *W* is univariate normal for arbitrary *L*, it follows from a characterization of the multivariate normal distribution that *W* is *p*-variate normal vector. Thus,

$$U = \sqrt{n}\Sigma^{-\frac{1}{2}}(\tilde{X} - \mu) \to N\_p(O, I) \text{ as } n \to \infty,\tag{3.5.21}$$

which completes the proof.

A parallel result also holds in the complex domain. Let *X*˜ *<sup>j</sup> , j* = 1*, . . . , n,* be iid from some complex population with mean *μ*˜ and Hermitian positive definite covariance matrix *<sup>Σ</sup>*˜ <sup>=</sup> *<sup>Σ</sup>*˜ <sup>∗</sup> *> O* where *Σ*˜ *<sup>&</sup>lt;* <sup>∞</sup>. Letting ¯ *<sup>X</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*˜ <sup>1</sup> +···+ *X*˜ *n)*, we have

$$\sqrt{n}\,\,\tilde{\Sigma}^{-\frac{1}{2}}(\tilde{\bar{X}}-\tilde{\mu}) \to N\_p(O,I) \text{ as } n \to \infty. \tag{3.5a.5}$$

#### **Exercises 3.5**

**3.5.1.** By making use of the mgf or otherwise, show that the sample mean *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> + ···+*Xn)* in the real *p*-variate Gaussian case, *Xj* ∼ *Np(μ, Σ), Σ > O*, is again Gaussian distributed with the parameters *μ* and <sup>1</sup> *<sup>n</sup>Σ*.

**3.5.2.** Let *Xj* ∼ *Np(μ, Σ), Σ > O, j* = 1*,...,n* and iid. Let **X** = *(X*1*,...,Xn)* be the *<sup>p</sup>* <sup>×</sup> *<sup>n</sup>* sample matrix. Derive the density of (1) tr*(Σ*−1*(***<sup>X</sup>** <sup>−</sup> **<sup>M</sup>***)(***<sup>X</sup>** <sup>−</sup> **<sup>M</sup>***)* where **M** = *(μ, . . . , μ)* or a *p* × *n* matrix where all the columns are *μ*; (2) tr*(***XX**- *)*. Derive the densities in both the cases, including the noncentrality parameter.

**3.5.3.** Let the *p* × 1 real vector *Xj* ∼ *Np(μ, Σ), Σ > O* for *j* = 1*,...,n* and iid. Let **X** = *(X*1*,...,Xn)* the *p* × *n* sample matrix. Derive the density of tr*(***X** − **X**¯ *)(***X** − **X**¯ *)*- where **X**¯ = *(X,... ,* ¯ *X)*¯ is the *p* × *n* matrix where every column is *X*¯ .

**3.5.4.** Repeat Exercise 3.5.1 for the *p*-variate complex Gaussian case.

**3.5.5.** Repeat Exercise 3.5.2 for the complex Gaussian case and write down the density explicitly.

**3.5.6.** Consider a real bivariate normal density with the parameters *μ*1*, μ*2, *σ*<sup>2</sup> <sup>1</sup> *, σ*<sup>2</sup> <sup>2</sup> *, ρ*. Write down the density explicitly. Consider a simple random sample of size *n*, *X*1*,...,Xn*, from this population where *Xj* is 2 × 1, *j* = 1*,...,n*. Then evaluate the MLE of these five parameters by (1) by direct evaluation, (2) by using the general formula.

**3.5.7.** In Exercise 3.5.6 evaluate the maximum likelihood estimates of the five parameters if the following is an observed sample from this bivariate normal population:

$$
\begin{bmatrix} 0 \\ -1 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \begin{bmatrix} -1 \\ 2 \end{bmatrix}, \begin{bmatrix} 1 \\ 5 \end{bmatrix}, \begin{bmatrix} 0 \\ 7 \end{bmatrix}, \begin{bmatrix} 4 \\ 2 \end{bmatrix}
$$

**3.5.8.** Repeat Exercise 3.5.6 if the population is a bivariate normal in the complex domain.

**3.5.9.** Repeat Exercise 3.5.7 if the following is an observed sample from the complex bivariate normal population referred to in Exercise 3.5.8:

$$
\begin{bmatrix} 1+2i \\ i \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \begin{bmatrix} 3i \\ 1-i \end{bmatrix}, \begin{bmatrix} 2-i \\ i \end{bmatrix}, \begin{bmatrix} 2 \\ 1 \end{bmatrix}.
$$

**3.5.10.** Let the *p* × 1 real vector *X*<sup>1</sup> be Gaussian distributed, *X*<sup>1</sup> ∼ *Np(O, I ).* Consider the quadratic forms *u*<sup>1</sup> = *X*- <sup>1</sup>*A*1*X*1*, u*<sup>2</sup> = *X*- <sup>1</sup>*A*2*X*1. Let *Aj* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> *<sup>j</sup> , j* = 1*,* 2 and *A*1+*A*<sup>2</sup> = *I* . What can you say about the chisquaredness and independence of *u*<sup>1</sup> and *u*2? Prove your assertions.

**3.5.11.** Let *X*<sup>1</sup> ∼ *Np(O, I )*. Let *uj* = *X*- <sup>1</sup>*AjX*1*, Aj* <sup>=</sup> *<sup>A</sup>*<sup>2</sup> *<sup>j</sup> , j* = 1*,... , k, A*<sup>1</sup> +···+ *Ak* = *I* . What can you say about the chisquaredness and independence of the *uj* 's? Prove your assertions.

**5.3.12.** Repeat Exercise 3.5.11 for the complex case.

**5.3.13.** Let *Xj* <sup>∼</sup> *Np(μ, Σ), Σ > O, j* <sup>=</sup> <sup>1</sup>*,...,n* and iid. Let *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+*Xn)*. Show that the exponent in the density of *<sup>X</sup>*¯ , excluding <sup>−</sup><sup>1</sup> <sup>2</sup> , namely, <sup>√</sup>*n(X*¯ <sup>−</sup> *μ)*- *<sup>Σ</sup>*−1*(X*¯ <sup>−</sup> *μ)* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup>*. Derive the density of tr*(X*¯ - *Σ*−1*X)*¯ .

**3.5.14**. Let *<sup>Q</sup>* <sup>=</sup> <sup>√</sup>*n(X*¯ <sup>−</sup>*μ)*- *<sup>Σ</sup>*−1*(X*¯ <sup>−</sup>*μ)* as in Exercise 5.3.13. For a given *<sup>α</sup>* consider the probability statement *P r*{*<sup>Q</sup>* <sup>≥</sup> *<sup>b</sup>*} = *<sup>α</sup>*. Show that *<sup>b</sup>* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> *p,α* where *P r*{*χ*<sup>2</sup> *<sup>p</sup>* <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *p,α*} = *α*.

**3.5.15.** Let *<sup>Q</sup>*<sup>1</sup> <sup>=</sup> <sup>√</sup>*n(X*¯ <sup>−</sup> *μo)*- *<sup>Σ</sup>*−1*(X*¯ <sup>−</sup> *μo)* where *X, Σ* ¯ and *<sup>μ</sup>* are all as defined in Exercise 5.3.14. If *μo* <sup>=</sup> *<sup>μ</sup>*, show that *<sup>Q</sup>*<sup>1</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *p(λ)* where the noncentrality parameter *<sup>λ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *(μ* − *μo)*- *<sup>Σ</sup>*−1*(μ* <sup>−</sup> *μo)*.

#### **3.6. Elliptically Contoured Distribution, Real Case**

Let *X* be a real *p* × 1 vector of distinct real scalar variables with *x*1*,...,xp* as its components. For some *p* × 1 parameter vector *B* and *p* × *p* positive definite constant matrix *A>O*, consider the positive definite quadratic form *(X* − *B)*- *A(X* − *B)*. We have encountered such a quadratic form in the exponent of a real *p*-variate Gaussian density, in which case *<sup>B</sup>* <sup>=</sup> *<sup>μ</sup>* is the mean value vector and *<sup>A</sup>* <sup>=</sup> *<sup>Σ</sup>*−1, *<sup>Σ</sup>* being the positive definite covariance matrix. Let *g(*·*)* ≥ 0 be a non-negative function such that |*A*| 1 <sup>2</sup> *g((X* − *B)*- *A(X* − *B))* ≥ 0 and

$$\int\_{X} |A|^{\frac{1}{2}} \lg((X - B)^{\prime} A(X - B)) \mathrm{d}X = 1,\tag{3.6.1}$$

so that |*A*| 1 <sup>2</sup> *g((X* − *B)*- *A(X* − *B))* is a statistical density. Such a density is referred to as an *elliptically contoured density*.

#### **3.6.1. Some properties of elliptically contoured distributions**

Let *Y* = *A* 1 <sup>2</sup> *(X* − *B).* Then, from Theorem 1.6.1, d*X* = |*A*| −1 <sup>2</sup> d*Y* and from (3.6.1),

$$\int\_{Y} g(Y'Y) \mathrm{d}Y = 1\tag{3.6.2}$$

where

$$Y'Y = \mathbf{y}\_1^2 + \mathbf{y}\_2^2 + \dots + \mathbf{y}\_p^2,\ Y' = (\mathbf{y}\_1, \dots, \mathbf{y}\_p).$$

We can further simplify (3.6.2) via a general polar coordinate transformation:

$$\begin{aligned} \mathbf{y}\_1 &= r \sin \theta\_1 \\ \mathbf{y}\_2 &= r \cos \theta\_1 \sin \theta\_2 \\ \mathbf{y}\_3 &= r \cos \theta\_1 \cos \theta\_2 \sin \theta\_3 \\ &\vdots \\ \mathbf{y}\_{p-2} &= r \cos \theta\_1 \cdots \cos \theta\_{p-3} \sin \theta\_{p-2} \\ \mathbf{y}\_{p-1} &= r \cos \theta\_1 \cdots \cos \theta\_{p-2} \sin \theta\_{p-1} \\ \mathbf{y}\_p &= r \cos \theta\_1 \cdots \cos \theta\_{p-1} \end{aligned} \tag{3.6.3}$$

for <sup>−</sup>*<sup>π</sup>* <sup>2</sup> *< θj* <sup>≤</sup> *<sup>π</sup>* <sup>2</sup> *, j* = 1*,...,p* − 2*,* −*π<θp*−<sup>1</sup> ≤ *π,* 0 ≤ *r <* ∞. It then follows that

$$\operatorname{ddy}\_1 \wedge \dots \wedge \operatorname{dy}\_p = r^{p-1} (\cos \theta\_1)^{p-1} \cdots (\cos \theta\_{p-1}) \operatorname{d}r \wedge \operatorname{d\theta}\_1 \wedge \dots \wedge \operatorname{d\theta}\_{p-1}.\tag{3.6.4}$$

Thus,

$$\mathbf{y}\_1^2 + \dots + \mathbf{y}\_p^2 = r^2.$$

Given (3.6.3) and (3.6.4), observe that *r, θ*1*,...,θp*−<sup>1</sup> are mutually independently distributed. Separating the factors containing *θi* from (3.6.4) and then, normalizing it, we have

$$\int\_{-\frac{\pi}{2}}^{\frac{\pi}{2}} (\cos \theta\_i)^{p-i-1} \mathrm{d}\theta\_i = 1 \implies 2 \int\_0^{\frac{\pi}{2}} (\cos \theta\_i)^{p-i-1} \mathrm{d}\theta\_i = 1. \tag{i}$$

Let *<sup>u</sup>* <sup>=</sup> sin *θi* <sup>⇒</sup> <sup>d</sup>*<sup>u</sup>* <sup>=</sup> cos *θi*d*θi*. Then *(i)* becomes 2 # <sup>1</sup> <sup>0</sup> *(*<sup>1</sup> <sup>−</sup> *<sup>u</sup>*2*) p*−*i* <sup>2</sup> <sup>−</sup>1d*<sup>u</sup>* <sup>=</sup> 1, and letting *<sup>v</sup>* <sup>=</sup> *<sup>u</sup>*<sup>2</sup> gives *(i)* as

$$\int\_0^1 v^{\frac{1}{2}-1} (1-v)^{\frac{p-i}{2}-1} \mathrm{d}v = \frac{\Gamma(\frac{1}{2})\Gamma(\frac{p-i}{2})}{\Gamma(\frac{p-i+1}{2})} \,. \tag{ii}$$

Thus, the density of *θj* , denoted by *fj (θj )*, is

$$f\_j(\theta\_j) = \frac{\Gamma(\frac{p-j+1}{2})}{\Gamma(\frac{1}{2})\Gamma(\frac{p-j}{2})} (\cos \theta\_j)^{p-j-1}, \ -\frac{\pi}{2} < \theta\_j \le \frac{\pi}{2},\tag{3.6.5}$$

and zero, elsewhere, for *j* = 1*,...,p* − 2, and

$$f\_{p-1}(\theta\_{p-1}) = \frac{1}{2\pi}, \ -\pi \ < \theta\_{p-1} \le \pi,\tag{iiii}$$

and zero elsewhere. Taking the product of the *p* −2 terms in *(ii)* and *(iii),* the total integral over the *θj* 's is available as

$$\left\{\prod\_{j=1}^{p-2} \int\_{-\frac{\pi}{2}}^{\frac{\pi}{2}} (\cos \theta\_j)^{p-j-1} \mathrm{d}\theta\_j \right\} \int\_{-\pi}^{\pi} \mathrm{d}\theta\_{p-1} = \frac{2\pi^{\frac{p}{2}}}{\Gamma(\frac{p}{2})}.\tag{3.6.6}$$

The expression in (3.6.6), excluding 2, can also be obtained by making the transformation *s* = *Y* - *Y* and then writing d*s* in terms of d*Y* by appealing to Theorem 4.2.3.

# **3.6.2. The density of** *<sup>u</sup>* <sup>=</sup> *<sup>r</sup>*<sup>2</sup>

From (3.6.2) and (3.6.3),

$$\frac{2\pi^{\frac{p}{2}}}{\Gamma(\frac{p}{2})} \int\_{0}^{\infty} r^{p-1} g(r^2) \mathrm{d}r = 1,\tag{3.6.7}$$

that is,

$$2\int\_{r=0}^{\infty} r^{p-1} g(r^2) dr = \frac{\Gamma(\frac{p}{2})}{\pi^{\frac{p}{2}}}.\tag{i\nu}$$

Letting *<sup>u</sup>* <sup>=</sup> *<sup>r</sup>*2, we have

$$\int\_0^\infty u^{\frac{p}{2}-1} g(u) \mathrm{d}u = \frac{\Gamma(\frac{p}{2})}{\pi^{\frac{p}{2}}},\tag{v}$$

and the density of *r*, denoted by *fr(r)*, is available from (3.6.7) as

$$f\_r(r) = \frac{2\pi^{\frac{p}{2}}}{\Gamma(\frac{p}{2})} \, r^{p-1} g(r^2), \,\, 0 \le r < \infty,\tag{3.6.8}$$

and zero, elsewhere. The density of *<sup>u</sup>* <sup>=</sup> *<sup>r</sup>*<sup>2</sup> is then

$$f\_{\mu}(u) = \frac{\pi^{\frac{p}{2}}}{\Gamma(\frac{p}{2})} u^{\frac{p}{2}-1} g(u),\ 0 \le u < \infty,\tag{3.6.9}$$

and zero, elsewhere. Considering the density of *Y* given in (3.6.2), we may observe that *y*1*,...,yp* are identically distributed.

**Theorem 3.6.1.** *If yj* = *r uj , j* = 1*,... , p, in the transformation in (3.6.3), then <sup>E</sup>*[*u*<sup>2</sup> *<sup>j</sup>* ] = <sup>1</sup> *<sup>p</sup> , j* = 1*,... , p, and u*1*,...,up are uniformly distributed.*

*Proof***:** From (3.6.3), *yj* <sup>=</sup> *ruj , j* <sup>=</sup> <sup>1</sup>*,...,p*. We may observe that *<sup>u</sup>*<sup>2</sup> <sup>1</sup> +···+ *<sup>u</sup>*<sup>2</sup> *<sup>p</sup>* = 1 and that *<sup>u</sup>*1*,...,up* are identically distributed. Hence *<sup>E</sup>*[*u*<sup>2</sup> <sup>1</sup>] + *<sup>E</sup>*[*u*<sup>2</sup> <sup>2</sup>]+···+ *<sup>E</sup>*[*u*<sup>2</sup> *p*] = <sup>1</sup> <sup>⇒</sup> *<sup>E</sup>*[*u*<sup>2</sup> *<sup>j</sup>* ] = <sup>1</sup> *p* .

**Theorem 3.6.2.** *Consider the yj 's in Eq. (3.6.2). If g(u) is free of p and if E*[*u*] *<* ∞*, then <sup>E</sup>*[*y*<sup>2</sup> *<sup>j</sup>* ] = <sup>1</sup> <sup>2</sup>*<sup>π</sup> ; otherwise, <sup>E</sup>*[*y*<sup>2</sup> *<sup>j</sup>* ] = <sup>1</sup> *p E*[*u*] *provided E*[*u*] *exists.*

*Proof***:** Since *<sup>r</sup>* and *uj* are independently distributed and since *<sup>E</sup>*[*u*<sup>2</sup> *<sup>j</sup>* ] = <sup>1</sup> *<sup>p</sup>* in light of Theorem 3.6.1, *<sup>E</sup>*[*y*<sup>2</sup> *<sup>j</sup>* ] = *<sup>E</sup>*[*r*2]*E*[*u*<sup>2</sup> *<sup>j</sup>* ] = <sup>1</sup> *p <sup>E</sup>*[*r*2] = <sup>1</sup> *p E*[*u*]. From (3.6.9),

$$\int\_0^\infty u^{\frac{p}{2}-1} g(u) \mathrm{d}u = \frac{\Gamma\left(\frac{p}{2}\right)}{\pi^{\frac{p}{2}}}.\tag{\text{v}l}$$

However,

$$E[\mu] = \frac{\pi^{\frac{p}{2}}}{\Gamma(\frac{p}{2})} \int\_{\mu=0}^{\infty} u^{\frac{p}{2}+1-1} g(\mu) d\mu. \tag{\text{vii}}$$

Thus, assuming that *g(u)* is free of *p*, that *<sup>p</sup>* <sup>2</sup> can be taken as a parameter and that *(vii)* is convergent,

$$E[\mathbf{y}\_j^2] = \frac{1}{p} E[r^2] = \frac{1}{p} E[u] = \frac{1}{p} \frac{\pi^{\frac{p}{2}}}{\Gamma(\frac{p}{2})} \frac{\Gamma(\frac{p}{2} + 1)}{\pi^{\frac{p}{2} + 1}} = \frac{1}{2\pi};\tag{3.6.10}$$

otherwise, *<sup>E</sup>*[*y*<sup>2</sup> *<sup>j</sup>* ] = <sup>1</sup> *p E*[*u*] as long as *E*[*u*] *<* ∞.

### **3.6.3. Mean value vector and covariance matrix**

From (3.6.1),

$$E[X] = |A|^{\frac{1}{2}} \int\_X X \,\mathrm{g}((X - B)'A(X - B)) \mathrm{d}X.$$

Noting that

$$\begin{aligned} E[X] &= E[X - B + B] = B + E[X - B] \\ &= B + |A|^{\frac{1}{2}} \int\_X (X - B)g((X - B)'A(X - B)) \mathrm{d}X \end{aligned}$$

and letting *Y* = *A* 1 <sup>2</sup> *(X* − *B)*, we have

$$E[X] = B + \int\_Y Yg(Y'Y)dY,\ -\infty < \mathbf{y}\_j < \infty,\ j = 1,\dots,p.$$

But *Y* - *Y* is even whereas each element in *Y* is linear and odd. Hence, if the integral exists, # *<sup>Y</sup> Y g(Y* - *Y )*d*Y* = *O* and so, *E*[*X*] = *B* ≡ *μ*. Let *V* = Cov*(X),* the covariance matrix associated with *X*. Then

$$V = E[(X - \mu)(X - \mu)'] = A^{-\frac{1}{2}}[\int\_Y (YY')g(Y'Y)dY]A^{-\frac{1}{2}},$$
 
$$\dots \quad \dots \quad \dots \quad \dots$$

$$\begin{array}{ll}\text{where} & Y = A^z(X - \mu), \\ & \begin{bmatrix} \mathbf{y}\_1^2 & \mathbf{y}\_1 \mathbf{y}\_2 & \cdots & \mathbf{y}\_1 \mathbf{y}\_p \\ \mathbf{y}\_2 \mathbf{y}\_1 & \mathbf{y}\_2^2 & \cdots & \mathbf{y}\_2 \mathbf{y}\_p \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{y}\_p \mathbf{y}\_1 & \mathbf{y}\_p \mathbf{y}\_2 & \cdots & \mathbf{y}\_p^2 \end{bmatrix}. \end{array} \tag{\text{viiii}}$$

Since the non-diagonal elements, *yiyj , i* = *j,* are odd and *g(Y* - *Y )* is even, the integrals over the non-diagonal elements are equal to zero whenever the second moments exist. Since *E*[*Y* ] = *O*, *V* = *E(Y Y* - *)*. It has already been determined in (3.6.10) that *<sup>E</sup>*[*y*<sup>2</sup> *j* ] = 1 <sup>2</sup>*<sup>π</sup>* for *j* = 1*,... , p,* whenever *g(u)* is free of *p* and *E*[*u*] exists, the density of *u* being as specified in (3.6.9). If *g(u)* is not free of *p*, the diagonal elements will each integrate out to <sup>1</sup> *p <sup>E</sup>*[*r*2]. Accordingly,

$$\text{Cov}(X) = V = \frac{1}{2\pi} A^{-1} \quad \text{or} \quad V = \frac{1}{p} E[r^2] A^{-1}. \tag{3.6.12}$$

**Theorem 3.6.3.** *When X has the p-variate elliptically contoured distribution defined in (3.6.1), the mean value vector of X, E*[*X*] = *B and the covariance of X, denoted by Σ, is such that <sup>Σ</sup>* <sup>=</sup> <sup>1</sup> *p <sup>E</sup>*[*r*2]*A*−<sup>1</sup> *where <sup>A</sup> is the parameter matrix in (3.6.1), <sup>u</sup>* <sup>=</sup> *<sup>r</sup>*<sup>2</sup> *and <sup>r</sup> is defined in the transformation (3.6.3).*

#### **3.6.4. Marginal and conditional distributions**

Consider the density

$$f(X) = |A|^{\frac{1}{2}} g((X - \mu)' A(X - \mu)), \ A > O, \ -\infty < \mathbf{x}\_j < \infty, \ -\infty < \mu\_j < \infty \tag{3.6.13}$$

where *X*- = *(x*1*,...,xp), μ*- = *(μ*1*,...,μp), A* = *(aij )>O*. Consider the following partitioning of *X, μ* and *A*:

$$X = \begin{bmatrix} X\_1 \\ X\_2 \end{bmatrix}, \ \mu = \begin{bmatrix} \mu\_{(1)} \\ \mu\_{(2)} \end{bmatrix}, \ A = \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix}$$

where *X*1*, μ*<sup>1</sup> are *p*<sup>1</sup> × 1, *X*2*, μ*<sup>2</sup> are *p*<sup>2</sup> × 1, *A*<sup>11</sup> is *p*<sup>1</sup> × *p*<sup>1</sup> and *A*<sup>22</sup> is *p*<sup>2</sup> × *p*2, *p*<sup>1</sup> + *p*<sup>2</sup> = *p.* Then, as was established in Sect. 3.3,

The Multivariate Gaussian and Related Distributions

$$\begin{split} (X - \mu)' A(X - \mu) &= (X\_1 - \mu\_{(1)})' A\_{11} (X\_1 - \mu\_{(1)}) + 2 (X\_2 - \mu\_{(2)})' A\_{21} (X\_1 - \mu\_{(1)}) \\ &+ (X\_2 - \mu\_{(2)})' A\_{22} (X\_2 - \mu\_{(2)}) \\ &= (X\_1 - \mu\_{(1)})' [A\_{11} - A\_{12} A\_{22}^{-1} A\_{21}] (X\_1 - \mu\_{(1)}) \\ &+ (X\_2 - \mu\_{(2)} + C)' A\_{22} (X\_2 - \mu\_{(2)} + C), \ C = A\_{22}^{-1} A\_{21} (X\_1 - \mu\_{(1)}). \end{split}$$

In order to obtain the marginal density of *X*1, we integrate out *X*<sup>2</sup> from *f (X)*. Let the marginal densities of *X*<sup>1</sup> and *X*<sup>2</sup> be respectively denoted by *g*1*(X*1*)* and *g*2*(X*2*)*. Then

$$\begin{aligned} g\_1(X\_1) &= |A|^{\frac{1}{2}} \int\_{X\_2} g((X\_1 - \mu\_{(1)})'[A\_{11} - A\_{12}A\_{22}^{-1}A\_{21}](X\_1 - \mu\_{(1)})) \\ &+ (X\_2 - \mu\_{(2)} + C)'A\_{22}(X\_2 - \mu\_{(2)} + C)) \mathrm{d}X\_2. \end{aligned}$$

Letting *A* 1 2 <sup>22</sup>*(X*<sup>2</sup> − *μ(*2*)* + *C)* = *Y*2, d*Y*<sup>2</sup> = |*A*22| 1 <sup>2</sup> d*X*<sup>2</sup> and

$$\log p(X\_1) = |A|^{\frac{1}{2}} |A\_{22}|^{-\frac{1}{2}} \int\_{Y\_2} g((X\_1 - \mu\_{(1)})'[A\_{11} - A\_{12}A\_{22}^{-1}A\_{21}](X\_1 - \mu\_{(1)}) + Y\_2'Y\_2) \,\mathrm{d}Y\_2.$$

Note that <sup>|</sup>*A*|=|*A*22| |*A*<sup>11</sup> <sup>−</sup> *<sup>A</sup>*12*A*−<sup>1</sup> <sup>22</sup> *A*21| from the results on partitioned matrices presented in Sect. 1.3 and thus, |*A*| 1 <sup>2</sup> |*A*22| −1 <sup>2</sup> = |*A*<sup>11</sup> <sup>−</sup>*A*12*A*−<sup>1</sup> <sup>22</sup> *A*21| 1 <sup>2</sup> . We have seen that *Σ*−<sup>1</sup> is a constant multiple of *A* where *Σ* is the covariance matrix of the *p* × 1 vector *X*. Then

$$(\boldsymbol{\Sigma}^{\mathrm{l}\text{l}})^{-1} = \boldsymbol{\Sigma}\_{\mathrm{l}\text{l}} - \boldsymbol{\Sigma}\_{\mathrm{l}2} \boldsymbol{\Sigma}\_{22}^{-1} \boldsymbol{\Sigma}\_{21}$$

which is a constant multiple of *<sup>A</sup>*<sup>11</sup> <sup>−</sup>*A*12*A*−<sup>1</sup> <sup>22</sup> *A*21. If *Y* - <sup>2</sup>*Y*<sup>2</sup> = *s*2, then from Theorem 4.2.3,

$$\mathrm{d}Y\_2 = \frac{\pi^{\frac{p\_2}{2}}}{\Gamma(\frac{p\_2}{2})} \int\_{s\_2 > 0} s\_2^{\frac{p\_2}{2} - 1} g(s\_2 + u\_1) \,\mathrm{d}s\_2 \tag{3.6.14}$$

where *u*<sup>1</sup> = *(X*<sup>1</sup> − *μ(*1*))*- [*A*<sup>11</sup> <sup>−</sup> *<sup>A</sup>*12*A*−<sup>1</sup> <sup>22</sup> *A*21]*(X*<sup>1</sup> − *μ(*1*))*. Note that (3.6.14) is elliptically contoured or *X*<sup>1</sup> has an elliptically contoured distribution. Similarly, *X*<sup>2</sup> has an elliptically contoured distribution. Letting *<sup>Y</sup>*<sup>11</sup> <sup>=</sup> *(A*<sup>11</sup> <sup>−</sup> *<sup>A</sup>*12*A*−<sup>1</sup> <sup>22</sup> *A*21*)* 1 <sup>2</sup> *(X*<sup>1</sup> − *μ(*1*))*, then *Y*<sup>11</sup> has a spherically symmetric distribution. Denoting the density of *Y*<sup>11</sup> by *g*11*(Y*11*)*, we have

$$g\_1(Y\_{11}) = \frac{\pi^{\frac{p\_2}{2}}}{\Gamma(\frac{p\_2}{2})} \int\_{s\_2 > 0} s\_2^{\frac{p\_2}{2} - 1} g(s\_2 + Y\_{11}'Y\_{11}) \,\mathrm{ds}\_2. \tag{3.6.15}$$

By a similar argument, the marginal density of *X*2, namely *g*2*(X*2*)*, and the density of *Y*22, namely *g*22*(Y*22*)*, are as follows:

$$g\_2(X\_2) = |A\_{22} - A\_{21}A\_{11}^{-1}A\_{12}|^\frac{1}{2} \frac{\pi^{\frac{p\_1}{2}}}{\Gamma(\frac{p\_1}{2})}$$

$$\times \int\_{s\_1 > 0} s\_1^{\frac{p\_1}{2} - 1} g(s\_1 + (X\_2 - \mu\_{(2)})' [A\_{22} - A\_{21}A\_{11}^{-1}A\_{12}](X\_2 - \mu\_{(2)})) \, \mathrm{ds}\_1,$$

$$g\_{22}(Y\_{22}) = \frac{\pi^{\frac{p\_1}{2}}}{\Gamma(\frac{p\_1}{2})} \int\_{s\_1 > 0} s\_1^{\frac{p\_1}{2} - 1} g(s\_1 + Y\_{22}'Y\_{22}) \, \mathrm{ds}\_1. \tag{3.6.16}$$

#### **3.6.5. The characteristic function of an elliptically contoured distribution**

Let *T* be a *p* ×1 parameter vector, *T* - = *(t*1*,...,tp),* so that *T* - *X* = *t*1*x*<sup>1</sup> +···+*tpxp*. Then, the characteristic function of *<sup>X</sup>*, denoted by *φX(T )*, is *<sup>E</sup>*[e*i T* - *<sup>X</sup>*] where *<sup>E</sup>* denotes the expected value and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*, that is,

$$\phi\_X(T) = E[\mathbf{e}^{i\,T\,'X}] = \int\_X \mathbf{e}^{i\,T\,'X} |A|^{\frac{1}{2}} g((X-\mu)'A(X-\mu)) \mathrm{d}X. \tag{3.6.17}$$

Writing *X* as *X* − *μ* + *μ* and then making the transformation *Y* = *A* 1 <sup>2</sup> *(X* − *μ)*, we have

$$\phi\_X(T) = \mathbf{e}^{i\,T'\mu} \int\_Y \mathbf{e}^{i\,T'A^{-\frac{1}{2}}Y} g(Y'Y) \mathrm{d}Y. \tag{3.6.18}$$

However, *g(Y* - *Y )* is invariant under orthonormal transformation of the type *Z* = *P Y, PP*- = *I, P*- *P* = *I,* as *Z*- *Z* = *Y* - *Y* so that *g(Y* - *Y )* = *g(Z*- *Z)* for all orthonormal matrices. Thus,

$$\phi\_X(T) = \mathbf{e}^{i\,T'\mu} \int\_Z \mathbf{e}^{i\,T'A^{-\frac{1}{2}}P'Z} \mathbf{g}(Z'Z)dZ \tag{3.6.19}$$

for all *P*. This means that the integral in (3.6.19) is a function of *(T* - *A*−<sup>1</sup> <sup>2</sup> *)(T* - *A*−<sup>1</sup> <sup>2</sup> *)*- = *T* - *A*−1*T* , say *ψ(T* - *A*−1*T )*. Then,

$$\phi\_X(T) = \mathbf{e}^{i\,\,T^\prime\mu} \psi\,(T^\prime A^{-1}T) \tag{3.6.20}$$

where *A*−<sup>1</sup> is proportional to *Σ*, the covariance matrix of *X,* and

$$\frac{\partial}{\partial T}\phi\_X(T)|\_{T=O} = i\mu \Rightarrow E(X) = \mu;$$

the reader may refer to Chap. 1 for vector/matrix derivatives. Now, considering *φX*−*μ(T )*, we have

$$\begin{split} &\frac{\partial}{\partial T}\phi\_{X-\mu}(T) = \frac{\partial}{\partial T}\psi(T'A^{-1}T) = \psi'(TA^{-1}T)2A^{-1}T \\ &\Rightarrow \frac{\partial}{\partial T'}\psi(T'A^{-1}T) = \psi'(T'A^{-1}T)2T'A^{-1} \\ &\Rightarrow \frac{\partial}{\partial T}\frac{\partial}{\partial T'}\psi(T'A^{-1}T) = \psi''(T'A^{-1}T)(2A^{-1}T)(2T'A^{-1}) + \psi'(TA^{-1}T)2A^{-1} \\ &\Rightarrow \frac{\partial}{\partial T}\frac{\partial}{\partial T'}\psi(T'A^{-1}T)|\_{T=0} = 2A^{-1}, \end{split}$$

assuming that *ψ*- *(T* - *<sup>A</sup>*−1*T )*|*<sup>T</sup>* <sup>=</sup>*<sup>O</sup>* <sup>=</sup> 1 and *<sup>ψ</sup>*--*(T* - *<sup>A</sup>*−1*T )*|*<sup>T</sup>* <sup>=</sup>*<sup>O</sup>* <sup>=</sup> 1, where *<sup>ψ</sup>*- *(u)* <sup>=</sup> <sup>d</sup> <sup>d</sup>*uψ(u)* for a real scalar variable *u* and *ψ*--*(u)* denotes the second derivative of *ψ* with respect to *u*. The same procedure can be utilized to obtain higher order moments of the type *E*[··· *XX*- *XX*- ] by repeatedly applying vector derivatives to *φX(T )* as ··· *<sup>∂</sup> ∂T ∂ ∂T* operating on *φX(T )* and then evaluating the result at *T* = *O*. Similarly, higher order central moments of the type *E*[···*(X* − *μ)(X* − *μ)*- *(X* − *μ)(X* − *μ)*- ] are available by applying the vector differential operator ··· *<sup>∂</sup> ∂T ∂ ∂T* on *ψ(T* - *A*−1*T )* and then evaluating the result at *<sup>T</sup>* <sup>=</sup> *<sup>O</sup>*. However, higher moments with respect to individual variables, such as *<sup>E</sup>*[*x<sup>k</sup> <sup>j</sup>* ]*,* are available by differentiating *φX(T )* partially *k* times with respect to *tj* , and then evaluating the resulting expression at *T* = *O*. If central moments are needed then the differentiation is done on *ψ(T* - *A*−1*T )*.

Thus, we can obtain results parallel to those derived for the *p*-variate Gaussian distribution by applying the same procedures on elliptically contoured distributions. Accordingly, further discussion of elliptically contoured distributions will not be taken up in the coming chapters.

#### **Exercises 3.6**

**3.6.1.** Let *x*1*,...,xk* be independently distributed real scalar random variables with density functions *fj (xj ), j* = 1*,...,k*. If the joint density of *x*1*,...,xk* is of the form *<sup>f</sup>*1*(x*1*)*··· *fk(xk)* <sup>=</sup> *g(x*<sup>2</sup> <sup>1</sup> + ··· + *<sup>x</sup>*<sup>2</sup> *<sup>k</sup> )* for some differentiable function *g*, show that *x*1*,...,xk* are identically distributed as Gaussian random variables.

**3.6.2.** Letting the real scalar random variables *x*1*,...,xk* have a joint density such that *f (x*1*,...,xk)* <sup>=</sup> *<sup>c</sup>* for *<sup>x</sup>*<sup>2</sup> <sup>1</sup> +···+ *<sup>x</sup>*<sup>2</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>r</sup>*2*,r>* <sup>0</sup>*,* show that (1) *(x*1*,...,xk)* is uniformly distributed over the volume of the *k*-dimensional sphere; (2) *E*[*xj* ] = 0*,* Cov*(xi, xj )* = 0*, i* = *j* = 1*,...,k*; (3) *x*1*,...,xk* are not independently distributed.

**3.6.3.** Let *u* = *(X* − *B)*- *A(X* − *B)* in Eq. (3.6.1), where *A>O* and *A* is a *p* × *p* matrix. Let *g(u)* <sup>=</sup> *<sup>c</sup>*1*(*1−*a u)ρ,a>* <sup>0</sup>*,* <sup>1</sup>−*au>* 0 and *<sup>c</sup>*<sup>1</sup> is an appropriate constant. If <sup>|</sup>*A*<sup>|</sup> 1 <sup>2</sup> *g(u)* is a density, show that (1) this density is elliptically contoured; (2) evaluate its normalizing constant and specify the conditions on the parameters.

**3.6.4.** Solve Exercise 3.6.3 for *g(u)* <sup>=</sup> *<sup>c</sup>*2*(*1+*a u)*−*ρ*, where *<sup>c</sup>*<sup>2</sup> is an appropriate constant.

**3.6.5.** Solve Exercise 3.6.3 for *g(u)* <sup>=</sup> *<sup>c</sup>*<sup>3</sup> *<sup>u</sup><sup>γ</sup>* <sup>−</sup>1*(*<sup>1</sup> <sup>−</sup> *a u)β*−1*,a>* <sup>0</sup>*,* <sup>1</sup> <sup>−</sup> *au>* 0 and *<sup>c</sup>*<sup>3</sup> is an appropriate constant.

**3.6.6.** Solve Exercise 3.6.3 for *g(u)* <sup>=</sup> *<sup>c</sup>*<sup>4</sup> *<sup>u</sup><sup>γ</sup>* <sup>−</sup>1e−*a u,a>* 0 where *<sup>c</sup>*<sup>4</sup> is an appropriate constant.

**3.6.7.** Solve Exercise 3.6.3 for *g(u)* <sup>=</sup> *<sup>c</sup>*<sup>5</sup> *<sup>u</sup><sup>γ</sup>* <sup>−</sup>1*(*<sup>1</sup> <sup>+</sup> *a u)*−*(ρ*+*γ ),a>* 0 where *<sup>c</sup>*<sup>5</sup> is an appropriate constant.

**3.6.8.** Solve Exercises 3.6.3 to 3.6.7 by making use of the general polar coordinate transformation.

**3.6.9.** Let *<sup>s</sup>* <sup>=</sup> *<sup>y</sup>*<sup>2</sup> <sup>1</sup> +···+ *<sup>y</sup>*<sup>2</sup> *<sup>p</sup>* where *yj , j* = 1*,... , p,* are real scalar random variables. Let d*Y* = d*y*<sup>1</sup> ∧ *...* ∧ d*yp* and let d*s* be the differential in *s*. Then, it can be shown that <sup>d</sup>*<sup>Y</sup>* <sup>=</sup> *<sup>π</sup> p* 2 *Γ ( <sup>p</sup>* 2 *) s p* <sup>2</sup> <sup>−</sup>1d*s*. By using this fact, solve Exercises 3.6.3–3.6.7.

**3.6.10.** If *A* = 3 2 2 4 *,* write down the elliptically contoured density in (3.6.1) explicitly by taking an arbitrary *<sup>b</sup>* <sup>=</sup> *<sup>E</sup>*[*X*] = *μ,* if (1) *g(u)* <sup>=</sup> *(a*−*c u)α,a>* <sup>0</sup> *,c >* <sup>0</sup>*, a*−*cu>* 0; (2) *g(u)* <sup>=</sup> *(a* <sup>+</sup> *c u)*−*β,a>* <sup>0</sup>*, c>* <sup>0</sup>*,* and specify the conditions on *<sup>α</sup>* and *<sup>β</sup>*.

#### **References**

A.M. Mathai (1997): *Jacobians of Matrix Transformations and Functions of Matrix Argument*, World Scientific Publishing, New York.

A.M. Mathai and Hans J. Haubold (2017): *Probability and Statistics, A Course for Physicists and Engineers*, De Gruyter, Germany.

A.M. Mathai and P.N. Rathie (1975): *Basic Concepts and Information Theory and Statistics: Axiomatic Foundations and Applications*, Wiley Halsted, New York.

A.M. Mathai and Serge B. Provost (1992): *Quadratic Forms in Random Variables: Theory and Applications*, Marcel Dekker, New York.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 4 The Matrix-Variate Gaussian Distribution**

#### **4.1. Introduction**

This chapter relies on various results presented in Chap. 1. We will introduce a class of integrals called the real matrix-variate Gaussian integrals and complex matrix-variate Gaussian integrals wherefrom a statistical density referred to as the matrix-variate Gaussian density and, as a special case, the multivariate Gaussian or normal density will be obtained, both in the real and complex domains.

The notations introduced in Chap. 1 will also be utilized in this chapter. Scalar variables, mathematical and random, will be denoted by lower case letters, vector/matrix variables will be denoted by capital letters, and complex variables will be indicated by a tilde. Additionally, the following notations will be used. All the matrices appearing in this chapter are *p* × *p* real positive definite or Hermitian positive definite unless stated otherwise. *X>O* will mean that that the *p* × *p* real symmetric matrix *X* is positive definite and *X>O* ˜ , that the *p* × *p* matrix *X*˜ in the complex domain is Hermitian, that is, *X*˜ = *X*˜ <sup>∗</sup> where *X*˜ <sup>∗</sup> denotes the conjugate transpose of *X*˜ and *X*˜ is positive definite. *O<A<X<B* will indicate that the *p* × *p* real positive definite matrices are such that *A > O, B > O, X > O, X*−*A > O, B*−*X>O*. # *<sup>X</sup> f (X)*d*X* represents a real-valued scalar function *f (X)* being integrated out over all *X* in the domain of *X* where d*X* stands for the wedge product of differentials of all distinct elements in *X*. If *X* = *(xij )* is a real *p*×*q* matrix, the *xij* 's being distinct real scalar variables, then d*X* = d*x*11∧d*x*12∧*...*∧d*xpq* or d*<sup>X</sup>* = ∧*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> <sup>∧</sup>*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>d</sup>*xij* . If *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>*- , that is, *X* is a real symmetric matrix of dimension *<sup>p</sup>* <sup>×</sup>*p*, then d*<sup>X</sup>* = ∧*<sup>p</sup> <sup>i</sup>*≥*j*=1d*xij* = ∧*<sup>p</sup> <sup>i</sup>*≤*j*=1d*xij* , which involves only *p(p* <sup>+</sup>1*)/*2 differential elements d*xij* . When taking the wedge product, the elements *xij* 's may be taken in any convenient order to start with. However, that order has to be maintained until the computations are completed. If *X*˜ = *X*<sup>1</sup> + *iX*2, where *X*<sup>1</sup> and *X*<sup>2</sup> are real *p* × *q* matrices, *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*, then d*X*˜ will be defined as d*X*˜ <sup>=</sup> <sup>d</sup>*X*<sup>1</sup> <sup>∧</sup> <sup>d</sup>*X*2. # *A<X<B* ˜ *f (X)*˜ d*X*˜ represents the real-valued scalar function *f* of complex matrix argument *X*˜ being integrated out over all *p* × *p* matrix *X*˜ such that *A > O, X > O, B > O,* ˜ *X*˜ − *A > O, B* − *X>O* ˜ (all Hermitian positive definite), where *A* and *B* are constant matrices in the sense that they are free of the elements of *X*˜ . The corresponding integral in the real case will be denoted by # *A<X<B f (X)*d*<sup>X</sup>* <sup>=</sup> # *<sup>B</sup> <sup>A</sup> f (X)*d*X*, *A > O, X > O, X* − *A > O, B > O, B* − *X>O*, where *A* and *B* are constant matrices, all the matrices being of dimension *p* × *p*.

#### **4.2. Real Matrix-variate and Multivariate Gaussian Distributions**

Let *X* = *(xij )* be a *p* × *q* matrix whose elements *xij* are distinct real variables. For any real matrix *X*, be it square or rectangular, tr*(XX*- *)* = tr*(X*- *X)* = sum of the squares of all the elements of *X*. Note that *XX* need not be equal to *X*- *X*. Thus, tr*(XX*- *)* <sup>=</sup> *<sup>p</sup> i*=1 *<sup>q</sup> <sup>j</sup>*=<sup>1</sup> *<sup>x</sup>*<sup>2</sup> *ij* and, in the complex case, tr*(X*˜ *<sup>X</sup>*˜ <sup>∗</sup>*)* <sup>=</sup> *<sup>p</sup> i*=1 *<sup>q</sup> <sup>j</sup>*=<sup>1</sup> | ˜*xij* <sup>|</sup> <sup>2</sup> where if *<sup>x</sup>*˜*rs* <sup>=</sup> *xrs*<sup>1</sup> <sup>+</sup> *ixrs*<sup>2</sup> where *xrs*<sup>1</sup> and *xrs*<sup>2</sup> are real, *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*, with | ˜*xrs*| = +[*x*<sup>2</sup> *rs*<sup>1</sup> <sup>+</sup> *<sup>x</sup>*<sup>2</sup> *rs*2] 1 2 . Consider the following integrals over the real rectangular *p* × *q* matrix *X*:

$$\begin{split} I\_{1} &= \int\_{X} \mathbf{e}^{-\text{tr}(XX')} \mathbf{d}X = \int\_{X} \mathbf{e}^{-\sum\_{i=1}^{p} \sum\_{j=1}^{q} \mathbf{x}\_{ij}^{2}} \mathbf{d}X = \prod\_{i,j} \int\_{-\infty}^{\infty} \mathbf{e}^{-\mathbf{x}\_{ij}^{2}} \mathbf{dx}\_{ij} \\ &= \prod\_{i,j} \sqrt{\pi} = \pi^{\frac{pq}{2}}, \end{split} \tag{i}$$

$$I\_2 = \int\_X \mathbf{e}^{-\frac{1}{2}\text{tr}(XX')} \mathbf{dX} = (2\pi)^{\frac{pq}{2}}.\tag{\text{ii}}$$

Let *A>O* be *p*×*p* and *B>O* be *q* ×*q* constant positive definite matrices. Then we can define the unique positive definite square roots *A* 1 <sup>2</sup> and *B* 1 <sup>2</sup> . For the discussions to follow, we need only the representations *A* = *A*1*A*- <sup>1</sup>*, B* = *B*1*B*- <sup>1</sup> with *A*<sup>1</sup> and *B*<sup>1</sup> nonsingular, a prime denoting the transpose. For an *m* × *n* real matrix *X*, consider

$$\begin{split} \text{tr}(AXBX') = \text{tr}(A^{\frac{1}{2}}A^{\frac{1}{2}}XB^{\frac{1}{2}}B^{\frac{1}{2}}X') = \text{tr}(A^{\frac{1}{2}}XB^{\frac{1}{2}}B^{\frac{1}{2}}X'A^{\frac{1}{2}}) \\ = \text{tr}(YY'), \ Y = A^{\frac{1}{2}}XB^{\frac{1}{2}}. \end{split} \tag{iiii}$$

In order to obtain the above results, we made use of the property that for any two matrices *P* and *Q* such that *P Q* and *QP* are defined, tr*(P Q)* = tr*(QP )* where *P Q* need not be equal to *QP*. As well, letting *Y* = *(yij )*, tr*(Y Y* - *)* <sup>=</sup> *<sup>p</sup> i*=1 *<sup>q</sup> <sup>j</sup>*=<sup>1</sup> *<sup>y</sup>*<sup>2</sup> *ij* . *Y Y* is real positive definite when *Y* is *p* × *q, p* ≤ *q,* is of full rank *p*. Observe that any real square matrix *U* that can be written as *U* = *V V* for some matrix *V* where *V* may be square or rectangular, is either positive definite or at least positive semi-definite. When *V* is a *p* × *q* matrix, *q* ≥ *p,* whose rank is *p*, *V V* is positive definite; if the rank of *V* is less than *p*, then *V V* - is positive semi-definite. From Result 1.6.4,

Matrix-Variate Gaussian Distribution

$$\begin{aligned} Y = A^{\frac{1}{2}} X B^{\frac{1}{2}} &\Rightarrow \mathbf{d}Y = |A|^{\frac{q}{2}} |B|^{\frac{p}{2}} \mathbf{d}X \\ &\Rightarrow \mathbf{d}X = |A|^{-\frac{q}{2}} |B|^{-\frac{p}{2}} \mathbf{d}Y \end{aligned} \tag{i\nu}$$

where we use the standard notation |*(*·*)*| = det*(*·*)* to denote the determinant of *(*·*)* in general and |det*(*·*)*| to denote the absolute value or modulus of the determinant of *(*·*)* in the complex domain. Let

$$f\_{p,q}(X) = \frac{|A|^{\frac{q}{2}}|B|^{\frac{p}{2}}}{(2\pi)^{\frac{pq}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}(AXBX')}, \ A > O, \ B > O \tag{4.2.1}$$

for *X* = *(xij ),* −∞ *< xij <* ∞ for all *i* and *j* . From the steps *(i)* to *(iv)*, we see that *fp,q (X)* in (4.2.1) is a statistical density over the real rectangular *p* × *q* matrix *X*. This function *fp,q(X)* is known as the *real matrix-variate Gaussian density*. We introduced a 1 <sup>2</sup> in the exponent so that particular cases usually found in the literature agree with the real *p*-variate Gaussian distribution. Actually, this <sup>1</sup> <sup>2</sup> factor is quite unnecessary from a mathematical point of view as it complicates computations rather than simplifying them. In the complex case, the factor <sup>1</sup> <sup>2</sup> does not appear in the exponent of the density, which is consistent with the current particular cases encountered in the literature.

**Note 4.2.1.** If the factor <sup>1</sup> <sup>2</sup> is omitted in the exponent, then 2*π* is to be replaced by *π* in the denominator of (4.2.1), namely,

$$f\_{p,q}(X) = \frac{|A|^{\frac{q}{2}}|B|^{\frac{p}{2}}}{(\pi)^{\frac{pq}{2}}} \mathbf{e}^{-\text{tr}(AXBX')}, \ A > O, \ B > O. \tag{4.2.2}$$

When *p* = 1, the matrix *X* is 1 × *q* and we let *X* = *(x*1*,...,xq )* where *X* is a row vector whose components are *x*1*,...,xq* . When *p* = 1*, A* is 1 × 1 or a scalar quantity. Letting *<sup>A</sup>* <sup>=</sup> 1 and *<sup>B</sup>* <sup>=</sup> *<sup>V</sup>* <sup>−</sup>1*, V > O,* be of dimension *<sup>q</sup>* <sup>×</sup> *<sup>q</sup>*, then in the real case,

$$\begin{split} f\_{1,q}(X) &= \frac{|\frac{1}{2}V^{-1}|^{\frac{1}{2}}}{\pi^{\frac{q}{2}}} \mathbf{e}^{-\frac{1}{2}XV^{-1}X'}, \ \ X = (\mathbf{x}\_1, \dots, \mathbf{x}\_q), \\ &= \frac{1}{(2\pi)^{\frac{q}{2}}|V|^{\frac{1}{2}}} \mathbf{e}^{-\frac{1}{2}XV^{-1}X'}, \end{split} \tag{4.2.3}$$

which is the usual real nonsingular Gaussian density with parameter matrix *V* , that is, *X*- ∼ *Nq (O, V )*. If a location parameter vector *μ* = *(μ*1*,...,μq )* is introduced or, equivalently, if *X* is replaced by *X* − *μ*, then we have

$$f\_{1,q}(X) = \left[ (2\pi)^{\frac{q}{2}} |V|^{\frac{1}{2}} \right]^{-1} \mathbf{e}^{-\frac{1}{2}(X-\mu)V^{-1}(X-\mu)'}, \; V > O. \tag{4.2.4}$$

On the other hand, when *q* = 1*,* a real *p*-variate Gaussian or normal density is available from (4.2.1) wherein *<sup>B</sup>* <sup>=</sup> 1; in this case, *<sup>X</sup>* <sup>∼</sup> *Np(μ, A*−1*)* where *<sup>X</sup>* and the location parameter vector *μ* are now *p* × 1 column vectors. This density is given by

$$f\_{p,1}(X) = \frac{|A|^{\frac{1}{2}}}{(2\pi)^{\frac{p}{2}}} \mathbf{e}^{-\frac{1}{2}(X-\mu)'A(X-\mu)}, \ A > O. \tag{4.2.5}$$

**Example 4.2.1.** Write down the exponent and the normalizing constant explicitly in a real matrix-variate Gaussian density where

$$\begin{aligned} X &= \begin{bmatrix} \boldsymbol{x}\_{11} & \boldsymbol{x}\_{12} & \boldsymbol{x}\_{13} \\ \boldsymbol{x}\_{21} & \boldsymbol{x}\_{22} & \boldsymbol{x}\_{23} \end{bmatrix}, \; E[X] = M = \begin{bmatrix} 1 & 0 & -1 \\ -1 & -2 & 0 \end{bmatrix}, \\\ A &= \begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix}, \; B = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 3 \end{bmatrix}, \end{aligned}$$

where the *xij* 's are real scalar random variables.

**Solution 4.2.1.** Note that *A* = *A* and *B* = *B*- , the leading minors in *A* being |*(*1*)*| = 1 *>* 0 and |*A*| = 1 *>* 0 so that *A>O*. The leading minors in *B* are |*(*1*)*| = 1 *>* 0*,* 1 1 1 2 <sup>=</sup> <sup>1</sup> *<sup>&</sup>gt;* 0 and

$$|B| = (1)\begin{vmatrix} 2 & 1 \\ 1 & 3 \end{vmatrix} - (1)\begin{vmatrix} 1 & 1 \\ 1 & 3 \end{vmatrix} + (1)\begin{vmatrix} 1 & 2 \\ 1 & 1 \end{vmatrix} = 2 > 0,$$

and hence *B>O*. The density is of the form

$$f\_{p,q}(X) = \frac{|A|^{\frac{q}{2}}|B|^{\frac{p}{2}}}{(2\pi)^{\frac{pq}{2}}} \mathrm{e}^{-\frac{1}{2}\mathrm{tr}(A(X-M)B(X-M)')}$$

where the normalizing constant is *(*1*)* 3 2 *(*2*)* 2 2 *(*2*π ) (*2*)(*3*)* 2 <sup>=</sup> <sup>2</sup> *(*2*π )*<sup>3</sup> <sup>=</sup> <sup>1</sup> <sup>4</sup>*π*<sup>3</sup> . Let *X*<sup>1</sup> and *X*<sup>2</sup> be the two rows of *X* and let *Y* = *X* − *M* = *Y*1 *Y*2 . Then *Y*<sup>1</sup> = *(y*11*, y*12*, y*13*)* = *(x*<sup>11</sup> − 1*, x*12*, x*<sup>13</sup> + 1*)*, *Y*<sup>2</sup> = *(y*21*, y*22*, y*23*)* = *(x*<sup>21</sup> + 1*, x*<sup>22</sup> + 2*, x*23*)*. Now

$$(X-M)B(X-M)' = \begin{bmatrix} Y\_1 \\ Y\_2 \end{bmatrix} B[Y\_1', Y\_2'] = \begin{bmatrix} Y\_1 B Y\_1' & Y\_1 B Y\_2' \\ Y\_2 B Y\_1' & Y\_2 B Y\_2' \end{bmatrix},$$

$$A(X-M)B(X-M)' = \begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} Y\_1 B Y\_1' & Y\_1 B Y\_2' \\ Y\_2 B Y\_1' & Y\_2 B Y\_2' \end{bmatrix}$$

$$=\begin{bmatrix} \begin{bmatrix} \begin{smallmatrix} 1 & 1\\ 1 & 1 \end{smallmatrix} \end{bmatrix} \begin{bmatrix} \begin{smallmatrix} 1\\ 1 \end{smallmatrix} \end{bmatrix} & \begin{smallmatrix} \begin{smallmatrix} 1\\ 1 \end{smallmatrix} \end{smallmatrix} \begin{bmatrix} \begin{smallmatrix} 1\\ 1 \end{smallmatrix} \end{bmatrix} \\\ = \begin{bmatrix} \begin{smallmatrix} Y\_1 \, BY\_1' + Y\_2 \, BY\_1'\\ Y\_1 \, BY\_2' + Y\_2 \, BY\_2'\\ Y\_1 \, BY\_2' + Y\_2 \, Y\_2 \, BY\_2' \end{bmatrix} \end{bmatrix}.$$

Thus,

$$\begin{split} \text{tr}[A(X-M)B(X-M)'] &= Y\_1BY\_1' + Y\_2BY\_1' + Y\_1BY\_2' + 2Y\_2BY\_2' \\ &= Y\_1BY\_1' + 2Y\_1BY\_2' + 2Y\_2BY\_2', \equiv \mathcal{Q}, \end{split} \tag{i}$$

noting that *Y*1*BY* - <sup>2</sup> and *Y*2*BY* - <sup>1</sup> are equal since both are real scalar quantities and one is the transpose of the other. Here are now the detailed computations of the various items:

$$Y\_1 BY\_1' = \mathbf{y}\_{11}^2 + 2\mathbf{y}\_{11}\mathbf{y}\_{12} + 2\mathbf{y}\_{11}\mathbf{y}\_{13} + 2\mathbf{y}\_{12}^2 + 2\mathbf{y}\_{12}\mathbf{y}\_{13} + 3\mathbf{y}\_{13}^2\tag{ii}$$

$$Y\_2BY\_2' = \mathbf{y}\_{21}^2 + 2\mathbf{y}\_{21}\mathbf{y}\_{22} + 2\mathbf{y}\_{21}\mathbf{y}\_{23} + 2\mathbf{y}\_{22}^2 + 2\mathbf{y}\_{22}\mathbf{y}\_{23} + 3\mathbf{y}\_{23}^2\tag{iii}$$

$$\begin{aligned} \mathbf{y}\_1 \mathbf{z} \mathbf{y}\_2' &= \mathbf{y}\_{11} \mathbf{y}\_{21} + \mathbf{y}\_{11} \mathbf{y}\_{22} + \mathbf{y}\_{11} \mathbf{y}\_{23} + \mathbf{y}\_{12} \mathbf{y}\_{21} + 2 \mathbf{y}\_{12} \mathbf{y}\_{22} + \mathbf{y}\_{12} \mathbf{y}\_{23} \\ &+ \mathbf{y}\_{13} \mathbf{y}\_{21} + \mathbf{y}\_{13} \mathbf{y}\_{22} + 3 \mathbf{y}\_{13} \mathbf{y}\_{33} \end{aligned} \tag{\text{iv}}$$

where the *y*1*<sup>j</sup>* 's and *y*2*<sup>j</sup>* 's and the various quadratic and bilinear forms are as specified above. The density is then

$$f\_{2,3}(X) = \frac{1}{4\pi^3} \mathbf{e}^{-\frac{1}{2}(Y\_1BY\_1' + 2Y\_1BY\_2' + Y\_2BY\_2')}$$

where the terms in the exponent are given in (*ii*)-(*iv*). This completes the computations.

#### **4.2a. The Matrix-variate Gaussian Density, Complex Case**

In the following discussion, the absolute value of a determinant will be denoted by |det*(A)*| where *A* is a square matrix. For example, if det*(A)* = *a* + *ib* with *a* and *b* real scalar and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*),* the determinant of the conjugate transpose of *<sup>A</sup>* is det*(A*∗*)* <sup>=</sup> *a* − *ib*. Then the absolute value of the determinant is

$$|\det(A)| = +\sqrt{(a^2 + b^2)} = +[(a+ib)(a-ib)]^{\frac{1}{2}} = +[\det(A)\det(A^\*)]^{\frac{1}{2}} = +[\det(AA^\*)]^{\frac{1}{2}}.\tag{4.2a.1}$$

The matrix-variate Gaussian density in the complex case, which is the counterpart to that given in (4.2.1) for the real case, is

$$\tilde{f}\_{p,q}(\tilde{X}) = \frac{|\det(A)|^q |\det(B)|^p}{\pi^{pq}} \mathbf{e}^{-\text{tr}(A\tilde{X}B\tilde{X}^\*)} \tag{4.2a.2}$$

for *A > O, B > O, X*˜ = *(x*˜*ij )*, |*(*·*)*| denoting the absolute value of *(*·*)*. When *p* = 1 and *A* = 1, the usual multivariate Gaussian density in the complex domain is obtained:

$$\tilde{f}\_{1,q}(\tilde{X}) = \frac{|\det(B)|}{\pi^q} \mathbf{e}^{-(\tilde{X}-\mu)B(\tilde{X}-\mu)^\*}, \ \tilde{X}' \sim \tilde{N}\_q(\tilde{\mu}', B^{-1}) \tag{4.2a.3}$$

<sup>=</sup> <sup>8</sup> *<sup>&</sup>gt;* <sup>0</sup>*,*

where *B>O* and *X*˜ and *μ* are 1 × *q* row vectors, *μ* being a location parameter vector. When *q* = 1 in (4.2*a*.1), we have the *p*-variate Gaussian or normal density in the complex case which is given by

$$\tilde{f}\_{p,1}(\tilde{X}) = \frac{|\det(A)|}{\pi^p} \mathbf{e}^{-(\tilde{X}-\mu)^\* A(\tilde{X}-\mu)}, \ \tilde{X} \sim \tilde{N}\_p(\mu, A^{-1}) \tag{4.2a.4}$$

where *X*˜ and the location parameter also denoted by *μ* are now *p* × 1 vectors.

**Example 4.2a.1.** Consider a 2×3 complex matrix-variate Gaussian density. Write down the normalizing constant and the exponent explicitly if

$$\begin{aligned} \tilde{X} &= \begin{bmatrix} \tilde{x}\_{11} & \tilde{x}\_{12} & \tilde{x}\_{13} \\ \tilde{x}\_{21} & \tilde{x}\_{22} & \tilde{x}\_{23} \end{bmatrix}, \ E[\tilde{X}] = \tilde{M} = \begin{bmatrix} i & -i & 1+i \\ 0 & 1-i & 1 \end{bmatrix}, \\\ A &= \begin{bmatrix} 3 & 1+i \\ 1-i & 2 \end{bmatrix}, \ B = \begin{bmatrix} 4 & 1+i & i \\ 1-i & 2 & 1-i \\ -i & 1+i & 3 \end{bmatrix}, \end{aligned}$$

where the *x*˜*ij* 's are scalar complex random variables.

**Solution 4.2a.1.** Let us verify the definiteness of *A* and *B*. It is obvious that *A* = *A*∗*, B* = *B*<sup>∗</sup> and hence they are Hermitian. The leading minors of *A* are |*(*3*)*| = 3 *>* 0*,* |*A*| = 4 *>* 0 and hence *A>O*. The leading minors of *B* are |*(*4*)*| = 4 *>* 0*,* 4 1 + *i* 1 − *i* 2 <sup>=</sup> <sup>6</sup> *<sup>&</sup>gt;* 0, 2 1 − *i* 1 − *i* 1 − *i* 1 − *i* 2 


and hence *B>O*. The normalizing constant is then

$$\frac{|\det(A)|^q |\det(B)|^p}{\pi^{pq}} = \frac{(4^3)(8^2)}{\pi^6}.$$

Let the two rows of *X*˜ be *X*˜ <sup>1</sup> and *X*˜ 2. Let *(X*˜ − *M)* ˜ = *Y*˜ = *Y*˜ 1 *Y*˜ 2 ,

$$\begin{aligned} Y\_1 &= (\tilde{\mathbf{y}}\_{11}, \tilde{\mathbf{y}}\_{12}, \tilde{\mathbf{y}}\_{13}) = (\tilde{x}\_{11} - i, \tilde{x}\_{12} + i, \tilde{x}\_{13} - (1 + i)), \\ \tilde{Y}\_2 &= (\tilde{\mathbf{y}}\_{21}, \tilde{\mathbf{y}}\_{22}, \tilde{\mathbf{y}}\_{23}) = (\tilde{x}\_{21}, \tilde{x}\_{22} - (1 - i), \tilde{x}\_{23} - 1). \end{aligned}$$

Matrix-Variate Gaussian Distribution

$$\begin{aligned} \tilde{\boldsymbol{\beta}}(\tilde{X} - \tilde{M})\boldsymbol{B}(\tilde{X} - \tilde{M})^\* &= \tilde{\boldsymbol{Y}}\boldsymbol{B}\tilde{\boldsymbol{Y}}^\* = \begin{bmatrix} \tilde{Y}\_1 \\ \tilde{Y}\_2 \end{bmatrix} \boldsymbol{B}[\tilde{Y}\_1^\*, \tilde{Y}\_2^\*] \\\\ &= \begin{bmatrix} \tilde{Y}\_1 \boldsymbol{B}\tilde{Y}\_1^\* & \tilde{Y}\_1 \boldsymbol{B}\tilde{Y}\_2^\* \\ \tilde{Y}\_2 \boldsymbol{B}\tilde{Y}\_1^\* & \tilde{Y}\_2 \boldsymbol{B}\tilde{Y}\_2^\* \end{bmatrix}. \end{aligned}$$

Then,

$$\begin{aligned} \text{tr}[A(\tilde{X} - \tilde{M})B(\tilde{X} - \tilde{M})^\*] &= \text{tr}\left\{ \begin{bmatrix} 3 & 1 + i \\ 1 - i & 2 \end{bmatrix} \begin{bmatrix} \tilde{Y}\_1 B\tilde{Y}\_1^\* & \tilde{Y}\_1 B\tilde{Y}\_2^\* \\ \tilde{Y}\_2 B\tilde{Y}\_1^\* & \tilde{Y}\_2 B\tilde{Y}\_2^\* \end{bmatrix} \right\} \\ &= 3\tilde{Y}\_1 B\tilde{Y}\_1^\* + (1 + i)(\tilde{Y}\_2 B\tilde{Y}\_1^\*) + (1 - i)(\tilde{Y}\_1 B\tilde{Y}\_2^\*) + 2\tilde{Y}\_2 B\tilde{Y}\_2^\* \\ &\equiv \mathcal{Q} \end{aligned} \tag{i}$$

where

$$\begin{aligned} \bar{Y}\_{1}\boldsymbol{B}\bar{Y}\_{1}^{\*} &= 4\tilde{\mathbf{y}}\_{11}\tilde{\mathbf{y}}\_{11}^{\*} + 2\tilde{\mathbf{y}}\_{12}\tilde{\mathbf{y}}\_{12}^{\*} + 3\tilde{\mathbf{y}}\_{13}\tilde{\mathbf{y}}\_{13}^{\*} \\ &+ (1+i)\tilde{\mathbf{y}}\_{11}\tilde{\mathbf{y}}\_{12}^{\*} + i\tilde{\mathbf{y}}\_{11}\tilde{\mathbf{y}}\_{13}^{\*} + (1-i)\tilde{\mathbf{y}}\_{12}\tilde{\mathbf{y}}\_{11}^{\*} \\ &+ (1-i)\tilde{\mathbf{y}}\_{12}\tilde{\mathbf{y}}\_{13}^{\*} - i\tilde{\mathbf{y}}\_{13}\tilde{\mathbf{y}}\_{11}^{\*} + (1+i)\tilde{\mathbf{y}}\_{13}\tilde{\mathbf{y}}\_{12}^{\*} \end{aligned} \tag{ii}$$

$$\begin{aligned} \tilde{Y}\_2 \mathcal{B} \tilde{Y}\_2^\* &= 4 \tilde{\mathbf{y}}\_{21} \tilde{\mathbf{y}}\_{21}^\* + 2 \tilde{\mathbf{y}}\_{22} \tilde{\mathbf{y}}\_{22}^\* + 3 \tilde{\mathbf{y}}\_{23} \tilde{\mathbf{y}}\_{23}^\* \\ &+ (1+i) \tilde{\mathbf{y}}\_{21} \tilde{\mathbf{y}}\_{22}^\* + i \tilde{\mathbf{y}}\_{21} \tilde{\mathbf{y}}\_{23}^\* + (1-i) \tilde{\mathbf{y}}\_{22} \tilde{\mathbf{y}}\_{21}^\* \\ &+ (1-i) \tilde{\mathbf{y}}\_{22} \tilde{\mathbf{y}}\_{23}^\* - i \tilde{\mathbf{y}}\_{23} \tilde{\mathbf{y}}\_{21}^\* + (1+i) \tilde{\mathbf{y}}\_{23} \tilde{\mathbf{y}}\_{22}^\* \end{aligned} \tag{iiii}$$

$$\begin{aligned} \tilde{Y}\_1 \mathcal{B} \tilde{Y}\_2^\* &= 4 \tilde{\mathbf{y}}\_{11} \tilde{\mathbf{y}}\_{21}^\* + 2 \tilde{\mathbf{y}}\_{12} \tilde{\mathbf{y}}\_{22}^\* + 3 \tilde{\mathbf{y}}\_{13} \tilde{\mathbf{y}}\_{23}^\* \\ &+ (1+i) \tilde{\mathbf{y}}\_{11} \tilde{\mathbf{y}}\_{22}^\* + i \tilde{\mathbf{y}}\_{11} \tilde{\mathbf{y}}\_{23}^\* + (1-i) \tilde{\mathbf{y}}\_{12} \tilde{\mathbf{y}}\_{21}^\* \\ &+ (1-i) \tilde{\mathbf{y}}\_{12} \tilde{\mathbf{y}}\_{23}^\* - i \tilde{\mathbf{y}}\_{13} \tilde{\mathbf{y}}\_{21}^\* + (1+i) \tilde{\mathbf{y}}\_{13} \tilde{\mathbf{y}}\_{22}^\* \end{aligned} \tag{\dot{\mathbf{v}}}$$

$$Y\_2BY\_1^\* = (i\nu)\text{ with }\tilde{\mathbf{y}}\_{1j}\text{ and }\tilde{\mathbf{y}}\_{2j}\text{ interchangeed.}\tag{\nu}$$

Hence, the density of *X*˜ is given by

$$\tilde{f}\_{2,3}(\tilde{X}) = \frac{(4^3)(8^2)}{\pi^6} \mathbf{e}^{-\mathcal{Q}\_3}$$

where *Q* is given explicitly in (*i*)-(*v*) above. This completes the computations.

#### **4.2.1. Some properties of a real matrix-variate Gaussian density**

In order to derive certain properties, we will need some more Jacobians of matrix transformations, in addition to those provided in Chap. 1. These will be listed in this section as basic results without proofs. The derivations as well as other related Jacobians are available from Mathai (1997).

**Theorem 4.2.1.** *Let X be a p* ×*q, q* ≥ *p, real matrix of rank p, that is, X has full rank, where the pq elements of X are distinct real scalar variables. Let X* = *T U*<sup>1</sup> *where T is a p* × *p real lower triangular matrix whose diagonal elements are positive and U*<sup>1</sup> *is a semi-orthonormal matrix such that U*1*U*- <sup>1</sup> = *Ip. Then*

$$\mathrm{d}X = \left\{ \prod\_{j=1}^{p} t\_{jj}^{q-j} \right\} \mathrm{d}T \, h(U\_1) \tag{4.2.6}$$

*where h(U*1*) is the differential element corresponding to U*1*.*

**Theorem 4.2.2.** *For the differential elements h(U*1*) in (4.2.6), the integral is over the Stiefel manifold Vp,q or over the space of p* × *q, q* ≥ *p, semi-orthonormal matrices and the integral over the full orthogonal group Op when q* = *p are respectively*

$$\int\_{V\_{p,q}} h(U\_1) = \frac{2^p \pi^{\frac{pq}{2}}}{\Gamma\_p(\frac{q}{2})} \text{ and } \int\_{O\_p} h(U\_1) = \frac{2^p \pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \tag{4.2.7}$$

*where Γp(α) is the real matrix-variate gamma function given by*

$$\Gamma\_p(\alpha) = \pi^{\frac{p(p-1)}{4}} \Gamma(\alpha) \Gamma(\alpha - 1/2) \cdots \Gamma(\alpha - (p-1)/2), \; \Re(\alpha) > \frac{p-1}{2}, \qquad (4.2.8)$$

*(*·*) denoting the real part of (*·*).*

For example,

$$\Gamma\_{\mathbb{G}}(\alpha) = \pi^{\frac{3(2)}{4}} \Gamma(\alpha) \Gamma(\alpha - 1/2) \Gamma(\alpha - 1) = \pi^{\frac{3}{2}} \Gamma(\alpha) \Gamma(\alpha - 1/2) \Gamma(\alpha - 1), \; \Re(\alpha) > 1.$$

With the help of Theorems 4.2.1, 4.2.2 and 1.6.7 of Chap. 1, we can derive the following result:

**Theorem 4.2.3.** *Let X be a real p* × *q, q* ≥ *p, matrix of rank p and S* = *XX*- *. Then, S>O (real positive definite) and*

$$\mathrm{d}X = \frac{\pi^{\frac{pq}{2}}}{\Gamma\_p(\frac{q}{2})} |\mathrm{S}|^{\frac{q}{2} - \frac{p+1}{2}} \mathrm{d}S,\tag{4.2.9}$$

*after integrating out over the Stiefel manifold.*

#### **4.2a.1. Some properties of a complex matrix-variate Gaussian density**

The corresponding results in the complex domain follow.

**Theorem 4.2a.1.** *Let X*˜ *be a p* ×*q, q* ≥ *p, matrix of rank p in the complex domain and T*˜ *be a p* × *p lower triangular matrix in the complex domain whose diagonal elements tjj >* 0*, j* = 1*,...,p, are real and positive. Then, letting U*˜<sup>1</sup> *be a semi-unitary matrix such that U*˜1*U*˜ <sup>∗</sup> <sup>1</sup> = *Ip,*

$$\tilde{X} = \tilde{T}\tilde{U}\_{\text{l}} \Rightarrow \text{d}\tilde{X} = \left\{ \prod\_{j=1}^{p} t\_{jj}^{2(q-j)+1} \right\} \text{d}\tilde{T}\,\tilde{h}(\tilde{U}\_{\text{l}}) \tag{4.2a.5}$$

*where h(*˜ *U*˜1*) is the differential element corresponding to U*˜1*.*

When integrating out *h(*˜ *U*˜1*)*, there are three situations to be considered. One of the cases is *q>p*. When *q* = *p*, the integration is done over the full unitary group *O*˜*p*; however, there are two cases to be considered in this instance. One case occurs where all the elements of the unitary matrix *U*˜1, including the diagonal ones, are complex, in which case *O*˜*<sup>p</sup>* will be denoted by *O*˜ *(*1*) <sup>p</sup> ,* and the other one, wherein the diagonal elements of *U*˜<sup>1</sup> are real, in which instance the unitary group will be denoted by *O*˜ *(*2*) <sup>p</sup>* . When unitary transformations are applied to Hermitian matrices, this is our usual situations when Hermitian matrices are involved, then the diagonal elements of the unique *U*˜<sup>1</sup> are real and hence the unitary group is *O*˜ *(*2*) <sup>p</sup>* . The integral of *h(*˜ *U*˜1*)* under these three cases are given in the next theorem.

**Theorem 4.2a.2.** *Let h(*˜ *U*˜1*) be as defined in equation (4.2a.5). Then, the integral of h(*˜ *U*˜1*), over the Stiefel manifold V*˜ *p,q of semi-unitary matrices for q>p, and when q* = *p, the integrals over the unitary groups O*˜ *(*1*) <sup>p</sup> and O*˜ *(*2*) <sup>p</sup> are the following:*

$$\begin{aligned} \int\_{\tilde{V}\_{p,q}} \tilde{h}(\tilde{U}\_{\mathbf{l}}) &= \frac{2^p \pi^{pq}}{\tilde{\Gamma}\_p(q)}, \; q > p; \\ \int\_{\tilde{O}\_p^{(\mathbf{l})}} \tilde{h}(\tilde{U}\_{\mathbf{l}}) &= \frac{2^p \pi^{p^2}}{\tilde{\Gamma}\_p(p)}, \; \int\_{\tilde{O}\_p^{(2)}} \tilde{h}(\tilde{U}\_{\mathbf{l}}) = \frac{\pi^{p(p-\mathbf{l})}}{\tilde{\Gamma}\_p(p)}, \; \end{aligned} \tag{4.2a.6}$$

*the factor* 2*<sup>p</sup> being omitted when U*˜<sup>1</sup> *is uniquely specified; O*˜ *(*1*) <sup>p</sup> is the case of a general X*˜ *, O*˜ *(*2*) <sup>p</sup> is the case corresponding to X*˜ *Hermitian, and Γ*˜ *p(α) is the complex matrix-variate gamma, given by*

$$\tilde{\Gamma}\_p(\alpha) = \pi^{\frac{p(p-1)}{2}} \Gamma(\alpha) \Gamma(\alpha - 1) \cdots \Gamma(\alpha - p + 1), \; \Re(\alpha) > p - 1. \tag{4.2a.7}$$

For example,

$$\tilde{\Gamma}\_3(a) = \pi^{\frac{3\langle 2\rangle}{2}} \Gamma(a) \Gamma(a-1) \Gamma(a-2) = \pi^3 \Gamma(a) \Gamma(a-1) \Gamma(a-2), \ \Re(a) > 2.$$

**Theorem 4.2a.3.** *Let X*˜ *be p* × *q, q* ≥ *p, matrix of rank p in the complex domain and S*˜ = *X*˜ *X*˜ <sup>∗</sup> *> O. Then after integrating out over the Stiefel manifold,*

$$\mathrm{d}\tilde{X} = \frac{\pi^{pq}}{\tilde{\Gamma}\_p(q)} |\mathrm{det}(\tilde{S})|^{q-p} \mathrm{d}\tilde{S}.\tag{4.2a.8}$$

#### **4.2.2. Additional properties in the real and complex cases**

On making use of the above results, we will establish a few results in this section as well as additional ones later on. Let us consider the matrix-variate Gaussian densities corresponding to (4.2.1) and (4.2*a*.2) with location matrices *M* and *M,* ˜ respectively, and let the densities be again denoted by *fp,q (X)* and *f*˜ *p,q (X)*˜ respectively, where

$$f\_{p,q}(X) = \frac{|A|^{\frac{q}{2}}|B|^{\frac{p}{2}}}{(2\pi)^{\frac{pq}{2}}} \mathbf{e}^{-\frac{1}{2} \text{tr}[A(X-M)B(X-M)']} \tag{4.2.10}$$

and

$$\tilde{f}\_{p,q}(\tilde{X}) = \frac{|\det(A)|^q |\det(B)|^p}{\pi^{pq}} \mathbf{e}^{-\text{tr}[A(\tilde{X}-\tilde{M})B(\tilde{X}-\tilde{M})^\*]}.\tag{4.2a.9}$$

Then, in the real case the expected value of *X* or the mean value of *X*, denoted by *E(X)*, is given by

$$E(X) = \int\_X X f\_{p,q}(X) \,\mathrm{d}X = \int\_X (X - M) f\_{p,q}(X) \,\mathrm{d}X + M \int\_X f\_{p,q}(X) \,\mathrm{d}X. \tag{i}$$

The second integral in *(i)* is the total integral in a density, which is 1, and hence the second integral gives *M*. On making the transformation *Y* = *A* 1 <sup>2</sup> *(X* − *M)B* 1 <sup>2</sup> , we have

$$E[X] = M + A^{-\frac{1}{2}} \frac{1}{(2\pi)^{\frac{np}{2}}} \int\_Y Y e^{-\frac{1}{2} \text{tr}(YY')} \text{d}Y \, B^{-\frac{1}{2}}.\tag{ii}$$

But tr*(Y Y* - *)* is the sum of squares of all elements in *Y* . Hence *Y* e−<sup>1</sup> <sup>2</sup> tr*(Y Y* - *)* is an odd function and the integral over each element in *Y* is convergent, so that each integral is zero. Thus, the integral over *Y* gives a null matrix. Therefore *E(X)* = *M*. It can be shown in a similar manner that *E(X)*˜ = *M*˜ .

Matrix-Variate Gaussian Distribution

**Theorem 4.2.4, 4.2a.4.** *For the densities specified in (4.2.10) and (4.2a.9),*

$$E(X) = M \text{ and } E(X) = M. \tag{4.2.11}$$

**Theorem 4.2.5, 4.2a.5.** *For the densities given in (4.2.10), (4.2a.9)*

$$E[(X-M)B(X-M)'] = qA^{-1},\ \ E[(X-M)'A(X-M)] = pB^{-1} \tag{4.2.12}$$

*and*

$$E[(\tilde{X} - \tilde{M})B(\tilde{X} - \tilde{M})^\*] = qA^{-1}, \; E[(\tilde{X} - \tilde{M})^\* A(\tilde{X} - \tilde{M})] = pB^{-1}.\tag{4.2a.10}$$

*Proof***:** Consider the real case first. Let *Y* = *A* 1 <sup>2</sup> *(X* − *M)B* 1 <sup>2</sup> <sup>⇒</sup> *<sup>A</sup>*−<sup>1</sup> <sup>2</sup> *Y* = *(X* − *M)B* 1 2 . Then

$$E[(X-M)\mathcal{B}(X-M)'] = \frac{A^{-\frac{1}{2}}}{(2\pi)^{\frac{pq}{2}}} \int\_Y Y Y' e^{-\frac{1}{2} \text{tr}(YY')} \text{d}Y A^{-\frac{1}{2}}.\tag{i}$$

Note that *Y* is *p*×*q* and *Y Y* is *p*×*p*. The non-diagonal elements in *Y Y* are dot products of the distinct row vectors in *Y* and hence linear functions of the elements of *Y* . The diagonal elements in *Y Y* are sums of squares of elements in the rows of *Y* . The exponent has all sum of squares and hence the convergent integrals corresponding to all the non-diagonal elements in *Y Y* are zeros. Hence, only the diagonal elements need be considered. Each diagonal element is a sum of squares of *q* elements of *Y* . For example, the first diagonal element in *Y Y* is *y*<sup>2</sup> <sup>11</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>12</sup> +···+ *<sup>y</sup>*<sup>2</sup> <sup>1</sup>*<sup>q</sup>* where *Y* = *(yij )*. Let *Y*<sup>1</sup> = *(y*11*,...,y*1*<sup>q</sup> )* be the first row of *Y* and let *s* = *Y*1*Y* - <sup>1</sup> <sup>=</sup> *<sup>y</sup>*<sup>2</sup> <sup>11</sup> +···+ *<sup>y</sup>*<sup>2</sup> <sup>1</sup>*<sup>q</sup>* . It follows from Theorem 4.2.3 that when *p* = 1,

$$\mathrm{d}Y\_1 = \frac{\pi^{\frac{q}{2}}}{\Gamma(\frac{q}{2})} s^{\frac{q}{2}-1} \mathrm{ds}.\tag{ii}$$

Then

$$\int\_{Y\_1} Y\_1 Y\_1' \mathbf{e}^{-\frac{1}{2} Y\_1 Y\_1'} \mathbf{d}Y\_1 = \int\_{s=0}^{\infty} s \frac{\pi^{\frac{q}{2}}}{\Gamma(\frac{q}{2})} s^{\frac{q}{2}-1} \mathbf{e}^{-\frac{1}{2}s} \mathbf{d}s.\tag{iii}$$

The integral part over *s* is 2*<sup>q</sup>* <sup>2</sup> <sup>+</sup>1*Γ ( <sup>q</sup>* <sup>2</sup> + 1*)* = 2 *q* <sup>2</sup> +1 *q* <sup>2</sup>*Γ ( <sup>q</sup>* <sup>2</sup> *)* = 2 *q* <sup>2</sup> *qΓ ( <sup>q</sup>* <sup>2</sup> *)*. Thus *Γ ( <sup>q</sup>* <sup>2</sup> *)* is canceled and *(*2*π ) <sup>q</sup>* <sup>2</sup> cancels with *(*2*π ) pq* <sup>2</sup> leaving *(*2*π )(p*−1*)q* <sup>2</sup> in the denominator and *q* in the numerator. We still have *<sup>p</sup>* <sup>−</sup>1 such sets of *<sup>q</sup>*, *<sup>y</sup>*<sup>2</sup> *ij* 's in the exponent in *(i)* and each such integrals is of the form # <sup>∞</sup> −∞ <sup>e</sup>−<sup>1</sup> 2 *z*2 <sup>d</sup>*<sup>z</sup>* <sup>=</sup> <sup>√</sup>*(*2*π )* which gives *(*2*π )(p*−1*)q* <sup>2</sup> and thus the factor containing *π* is also canceled leaving only *q* at each diagonal position in *Y Y* - . Hence the integral <sup>1</sup> *(*2*π ) pq* 2 # *<sup>Y</sup> Y Y* - e−<sup>1</sup> <sup>2</sup> tr*(Y Y* - *)* d*Y* = *qI* where *I* is the identity matrix, which establishes one of the results in (4.2.12). Now, write

$$\text{tr}[A(X-M)B(X-M)^\prime] = \text{tr}[(X-M)^\prime A(X-M)B] = \text{tr}[B(X-M)^\prime A(X-M)].$$

This is the same structure as in the previous case where *B* occupies the place of *A* and the order is now *q* in place of *p* in the previous case. Then, proceeding as in the derivations from (i) to (iii), the second result in (4.2.12) follows. The results in (4.2*a*.10) are established in a similar manner.

From (4.2.10), it is clear that the density of *Y,* denoted by *g(Y )*, is of the form

$$\log(Y) = \frac{1}{(2\pi)^{\frac{pq}{2}}} \mathbf{e}^{-\frac{1}{2} \text{tr}(YY')}, \ Y = (\mathbf{y}\_{ij}), \ -\infty < \mathbf{y}\_{ij} < \infty,\tag{4.2.13}$$

for all *i* and *j* . The individual *yij* 's are independently distributed and each *yij* has the density

$$g\_{ij}(\mathbf{y}\_{ij}) = \frac{1}{\sqrt{(2\pi)}} \mathbf{e}^{-\frac{1}{2}\mathbf{y}\_{ij}^{2}}, \ -\infty < \mathbf{y}\_{ij} < \infty. \tag{i\nu}$$

Thus, we have a real standard normal density for *yij* . The complex case corresponding to (4.2.13), denoted by *g(*˜ *Y )*˜ , is given by

$$\tilde{\mathbf{g}}(\tilde{Y}) = \frac{1}{\pi^{pq}} \mathbf{e}^{-\text{tr}(\tilde{Y}\tilde{Y}^\*)}.\tag{4.2a.11}$$

In this case, the exponent is tr*(Y*˜*Y*˜ <sup>∗</sup>*)* <sup>=</sup> *<sup>p</sup> i*=1 *<sup>q</sup> <sup>j</sup>*=<sup>1</sup> | ˜*yij* <sup>|</sup> <sup>2</sup> where *<sup>y</sup>*˜*rs* <sup>=</sup> *yrs*<sup>1</sup> <sup>+</sup> *iyrs*2*, yrs*1*, yrs*<sup>2</sup> real, *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)* and | ˜*yrs*<sup>|</sup> <sup>2</sup> <sup>=</sup> *<sup>y</sup>*<sup>2</sup> *rs*<sup>1</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> *rs*2.

For the real case, consider the probability that *yij* ≤ *tij* for some given *tij* and this is the distribution function of *yij* , which is denoted by *Fyij (tij )*. Then, let us compute the density of *y*<sup>2</sup> *ij* . Consider the probability that *<sup>y</sup>*<sup>2</sup> *ij* <sup>≤</sup> *u, u >* 0 for some *<sup>u</sup>*. Let *uij* <sup>=</sup> *<sup>y</sup>*<sup>2</sup> *ij* . Then, *P r*{*uij* ≤ *vij* } for some *vij* is the distribution function of *uij* evaluated at *vij* , denoted by *Fuij (vij )*. Consider

$$\Pr\{\mathbf{y}\_{ij}^2 \le t, \ t > 0\} = \Pr\{|\mathbf{y}\_{ij}| \le \sqrt{t}\} = \Pr\{-\sqrt{t} \le \mathbf{y}\_{ij} \le \sqrt{t}\} = F\_{\mathbf{y}\_{ij}}(\sqrt{t}) - F\_{\mathbf{y}\_{ij}}(-\sqrt{t}).\tag{v}$$

Differentiate throughout with respect to *<sup>t</sup>*. When *P r*{*y*<sup>2</sup> *ij* ≤ *t*} is differentiated with respect to *<sup>t</sup>*, we obtain the density of *uij* <sup>=</sup> *<sup>y</sup>*<sup>2</sup> *ij* , evaluated at *t*. This density, denoted by *hij (uij )*, is given by

Matrix-Variate Gaussian Distribution

$$\begin{split} h\_{ij}(u\_{ij})|\_{u\_{ij}=t} &= \frac{\mathbf{d}}{\mathbf{d}t} F\_{\mathbf{y}\_{ij}}(\sqrt{t}) - \frac{\mathbf{d}}{\mathbf{d}t} F(-\sqrt{t}) \\ &= g\_{ij}(\mathbf{y}\_{ij}=t) \frac{1}{2} \mathbf{t}^{\frac{1}{2}-1} - g\_{ij}(\mathbf{y}\_{ij}=t)(-\frac{1}{2} \mathbf{t}^{\frac{1}{2}-1}) \\ &= \frac{1}{\sqrt{(2\pi)}} [\mathbf{t}^{\frac{1}{2}-1} \mathbf{e}^{-\frac{1}{2}t}] = \frac{1}{\sqrt{(2\pi)}} [\mathbf{u}\_{ij}^{\frac{1}{2}-1} \mathbf{e}^{-\frac{1}{2}u\_{ij}}] \end{split} \tag{\text{v}}$$

evaluated at *uij* = *t* for 0 ≤ *t <* ∞. Hence we have the following result:

**Theorem 4.2.6.** *Consider the density fp,q(X) in (4.2.1) and the transformation Y* = *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> *. Letting Y* = *(yij ), the yij 's are mutually independently distributed as in (iv) above and each y*<sup>2</sup> *ij is distributed as a real chi-square random variable having one degree of freedom or equivalently a real gamma with parameters <sup>α</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *and β* = 2 *where the usual real scalar gamma density is given by*

$$f(z) = \frac{1}{\beta^{\alpha} \Gamma(\alpha)} z^{\alpha - 1} e^{-\frac{\tilde{z}}{\beta}},\tag{\text{v.i?}}$$

*for* 0 ≤ *z <* ∞*, (α) >* 0*, (β) >* 0 *and f (z)* = 0 *elsewhere.*

As a consequence of the *y*<sup>2</sup> *ij* 's being independently gamma distributed, *<sup>q</sup> <sup>j</sup>*=<sup>1</sup> *<sup>y</sup>*<sup>2</sup> *ij* is real gamma distributed with the parameters *<sup>α</sup>* <sup>=</sup> *<sup>q</sup>* <sup>2</sup> and *β* = 2. Then tr*(Y Y* - *)* is real gamma distributed with the parameters *<sup>α</sup>* <sup>=</sup> *pq* <sup>2</sup> and *β* = 2 and each diagonal element in *Y Y* - is real gamma distributed with parameters *<sup>q</sup>* <sup>2</sup> and *β* = 2 or a real chi-square variable with *q* degrees of freedom and an expected value 2*<sup>q</sup>* <sup>2</sup> = *q*. This is an alternative way of proving (4.2.12). Proofs for the other results in (4.2.12) and (4.2*a*.10) are parallel and hence are omitted.

#### **4.2.3. Some special cases**

Consider the real *p*×*q* matrix-variate Gaussian case where the exponent in the density is <sup>−</sup><sup>1</sup> <sup>2</sup> tr*(AXBX*- *)*. On making the transformation *A* 1 <sup>2</sup>*X* = *Z* ⇒ d*Z* = |*A*| *q* <sup>2</sup> d*X*, *Z* has a *p* × *q* matrix-variate Gaussian density of the form

$$f\_{p,q}(Z) = \frac{|B|^{\frac{p}{2}}}{(2\pi)^{\frac{pq}{2}}} \mathrm{e}^{-\frac{1}{2}\mathrm{tr}(ZBZ')}.\tag{4.2.14}$$

If the distribution has a *p* × *q* constant matrix *M* as location parameter, then replace *Z* by *Z*−*M* in (4.2.14), which does not affect the normalizing constant. Letting *Z*1*, Z*2*,...,Zp* denote the rows of *Z*, we observe that *Zj* has a *q*-variate multinormal distribution with the null vector as its mean value and *<sup>B</sup>*−<sup>1</sup> as its covariance matrix for each *<sup>j</sup>* <sup>=</sup> <sup>1</sup>*,...,p*. This can be seen from the considerations that follow. Let us consider the transformation *Y* = *ZB* 1 <sup>2</sup> ⇒ d*Z* = |*B*| −*p* <sup>2</sup> d*Y* . The density in (4.2.14) then reduces to the following, denoted by *fp,q (Y )*:

$$f\_{p,q}(Y) = \frac{1}{(2\pi)^{\frac{pq}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}(YY')}.\tag{4.2.15}$$

This means that each element *yij* in *Y* = *(yij )* is a real univariate standard normal variable, *yij* ∼ *N*1*(*0*,* 1*)* as per the usual notation, and all the *yij* 's are mutually independently distributed. Letting the *p* rows of *Y* be *Y*1*,...,Yp*, then each *Yj* is a *q*-variate standard normal vector for *j* = 1*,...,p*. Letting the density of *Yj* be denoted by *fYj (Yj )*, we have

$$f\_{Y\_j}(Y\_j) = \frac{1}{(2\pi)^{\frac{q}{2}}} \mathbf{e}^{-\frac{1}{2}(Y\_j Y'\_j)}.$$

Now, consider the transformation *Zj* <sup>=</sup> *YjB*−<sup>1</sup> <sup>2</sup> ⇒ d*Yj* = |*B*| 1 <sup>2</sup> d*Zj* and *Yj* = *ZjB* 1 <sup>2</sup> . That is, *YjY* - *<sup>j</sup>* = *ZjBZ*- *<sup>j</sup>* and the density of *Zj* denoted by *fZj (Zj )* is as follows:

$$f\_{Z\_j}(Z\_j) = \frac{|B|^{\frac{1}{2}}}{(2\pi)^{\frac{q}{2}}} \mathbf{e}^{-\frac{1}{2}(Z\_j B Z\_j')}, \ B > O,\tag{4.2.16}$$

which is a *q*-variate real multinormal density with the covariance matrix of *Zj* given by *<sup>B</sup>*−1, for each *<sup>j</sup>* <sup>=</sup> <sup>1</sup>*,... , p,* and the *Zj* 's, *<sup>j</sup>* <sup>=</sup> <sup>1</sup>*,... , p,* are mutually independently distributed. Thus, the following result:

**Theorem 4.2.7.** *Let Z*1*,...,Zp be the p rows of the p* × *q matrix Z in (4.2.14). Then each Zj has a q-variate real multinormal distribution with the covariance matrix B*−1*, for j* = 1*,...,p, and Z*1*,...,Zp are mutually independently distributed.*

Observe that the exponent in the original real *p* × *q* matrix-variate Gaussian density can also be rewritten in the following format:

$$\begin{aligned} -\frac{1}{2}\text{tr}(AXBX') &= -\frac{1}{2}\text{tr}(X'AXB) = -\frac{1}{2}\text{tr}(BX'AX) \\ &= -\frac{1}{2}\text{tr}(U'AU) = -\frac{1}{2}\text{tr}(ZBZ'), \ A^{\frac{1}{2}}X = Z, \ XB^{\frac{1}{2}} = U. \end{aligned}$$

Now, on making the transformation *U* = *XB* 1 <sup>2</sup> ⇒ d*X* = |*B*| −*p* <sup>2</sup> d*U*, the density of *U*, denoted by *fp,q (U )*, is given by

$$f\_{p,q}(U) = \frac{|A|^{\frac{q}{2}}}{(2\pi)^{\frac{pq}{2}}} \mathbf{e}^{-\frac{1}{2} \text{tr}(U'AU)}.\tag{4.2.17}$$

Proceeding as in the derivation of Theorem 4.2.7, we have the following result:

Matrix-Variate Gaussian Distribution

**Theorem 4.2.8.** *Consider the p* × *q real matrix U in (4.2.17). Let U*1*,...,Uq be the columns of U. Then, U*1*,...,Uq are mutually independently distributed with Uj having a p-variate multinormal density, denoted by fUj (Uj ), given as*

$$f\_{U\_j}(U\_j) = \frac{|A|^{\frac{1}{2}}}{(2\pi)^{\frac{p}{2}}} \mathbf{e}^{-\frac{1}{2}(U\_j'AU\_j)}.\tag{4.2.18}$$

The corresponding results in the *p* × *q* complex Gaussian case are the following:

**Theorem 4.2a.6.** *Consider the p* × *q complex Gaussian matrix X*˜ *. Let A* 1 <sup>2</sup>*X*˜ = *Z*˜ *and Z*˜ <sup>1</sup>*,...,Z*˜*<sup>p</sup> be the rows of Z*˜*. Then, Z*˜ <sup>1</sup>*,...,Z*˜*<sup>p</sup> are mutually independently distributed with Z*˜ *<sup>j</sup> having a q-variate complex multinormal density, denoted by f*˜ *Z*˜ *j (Z*˜ *<sup>j</sup> ), given by*

$$\tilde{f}\_{\tilde{Z}\_j}(\tilde{Z}\_j) = \frac{|\det(\mathcal{B})|}{\pi^q} \mathbf{e}^{-(\tilde{Z}\_j B \tilde{Z}\_j^\*)}.\tag{4.2a.12}$$

**Theorem 4.2a.7.** *Let the p*×*q matrix X*˜ *have a complex matrix-variate distribution. Let <sup>U</sup>*˜ <sup>=</sup> *XB*˜ <sup>1</sup> <sup>2</sup> *and U*˜1*,..., U*˜*<sup>q</sup> be the columns of U*˜ *. Then U*˜1*,..., U*˜*<sup>q</sup> are mutually independently distributed as p-variate complex multinormal with covariance matrix A*−<sup>1</sup> *each, the density of U*˜ *<sup>j</sup> , denoted by f*˜ *U*˜*j (U*˜ *<sup>j</sup> ), being given as*

$$
\tilde{f}\_{\tilde{U}\_j}(\tilde{U}\_j) = \frac{|\det(A)|}{\pi^p} \mathbf{e}^{-(\tilde{U}\_j^\* A \tilde{U}\_j)}.\tag{4.2a.13}
$$

#### **Exercises 4.2**

**4.2.1.** Prove the second result in equation (4.2.12) and prove both results in (4.2*a*.10).

**4.2.2.** Obtain (4.2.12) by establishing first the distribution of the row sum of squares and column sum of squares in *Y* , and then taking the expected values in those variables.

**4.2.3.** Prove (4.2*a*.10) by establishing first the distributions of row and column sum of squares of the absolute values in *Y*˜ and then taking the expected values.

**4.2.4.** Establish 4.2.12 and 4.2a.10 by using the general polar coordinate transformations.

**4.2.5.** First prove that *<sup>q</sup> <sup>j</sup>*=<sup>1</sup> | ˜*yij* <sup>|</sup> <sup>2</sup> is a 2*q*-variate real gamma random variable. Then establish the results in (4.2*a*.10) by using the those on real gamma variables, where *Y*˜ = *(y*˜*ij ),* the *y*˜*ij* 's in (4.2*a*.11) being in the complex domain and | ˜*yij* | denoting the absolute value or modulus of *y*˜*ij* .

**4.2.6.** Let the real matrix *A>O* be 2 × 2 with its first row being *(*1*,* 1*)* and let *B>O* be 3 × 3 with its first row being *(*1*,* 1*,* −1*)*. Then complete the other rows in *A* and *B* so that *A > O, B > O*. Obtain the corresponding 2×3 real matrix-variate Gaussian density when (1): *M* = *O*, (2): *M* = *O* with a matrix *M* of your own choice.

**4.2.7.** Let the complex matrix *A>O* be 2 × 2 with its first row being *(*1*,* 1 + *i)* and let *B>O* be 3 × 3 with its first row being *(*1*,* 1 + *i,* −*i)*. Complete the other rows with numbers in the complex domain of your own choice so that *A* = *A*<sup>∗</sup> *> O, B* = *B*<sup>∗</sup> *> O*. Obtain the corresponding 2×3 complex matrix-variate Gaussian density with (1): *M*˜ = *O*, (2): *M*˜ = *O* with a matrix *M*˜ of your own choice.

**4.2.8.** Evaluate the covariance matrix in (4.2.16), which is *E(Z*- *<sup>j</sup>Zj )*, and show that it is *B*−1.

**4.2.9.** Evaluate the covariance matrix in (4.2.18), which is *E(UjU*- *<sup>j</sup> )*, and show that it is *A*−1.

**4.2.10.** Repeat Exercises 4.2.8 and 4.2.9 for the complex case in (4.2*a*.12) and (4.2*a*.13).

#### **4.3. Moment Generating Function and Characteristic Function, Real Case**

Let *T* = *(tij )* be a *p* × *q* parameter matrix. The matrix random variable *X* = *(xij )* is *p* × *q* and it is assumed that all of its elements *xij* 's are real and distinct scalar variables. Then

$$\text{tr}(TX') = \sum\_{i=1}^{p} \sum\_{j=1}^{q} t\_{ij} x\_{ij} = \text{tr}(X'T) = \text{tr}(XT'). \tag{i}$$

Note that each *tij* and *xij* appear once in *(i)* and thus, we can define the moment generating function (mgf) in the real matrix-variate case, denoted by *Mf (T )* or *MX(T )*, as follows:

$$M\_f(T) = E[\mathbf{e}^{\text{tr}(TX')}] = \int\_X \mathbf{e}^{\text{tr}(TX')} f\_{p,q}(X)dX = M\_X(T) \tag{ii}$$

whenever the integral is convergent, where *E* denotes the expected value. Thus, for the *p* × *q* matrix-variate real Gaussian density,

$$M\_X(T) = M\_f(T) = \frac{|A|^{\frac{q}{2}} |B|^{\frac{p}{2}}}{(2\pi)^{\frac{pq}{2}}} \int\_X \mathbf{e}^{\text{tr}(TX') - \frac{1}{2} \text{tr}(A^{\frac{1}{2}} X B X' A^{\frac{1}{2}})} \text{d}X'$$

where *A* is *p* × *p*, *B* is *q* × *q* and *A* and *B* are constant real positive definite matrices so that *A* 1 <sup>2</sup> and *B* 1 <sup>2</sup> are uniquely defined. Consider the transformation *Y* = *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> ⇒ d*Y* = |*A*| *q* <sup>2</sup> |*B*| *p* <sup>2</sup> <sup>d</sup>*<sup>X</sup>* by Theorem 1.6.4. Thus, *<sup>X</sup>* <sup>=</sup> *<sup>A</sup>*−<sup>1</sup> <sup>2</sup> *Y B*−<sup>1</sup> <sup>2</sup> and

$$\text{tr}(TX') = \text{tr}(TB^{-\frac{1}{2}}Y'A^{-\frac{1}{2}}) = \text{tr}(A^{-\frac{1}{2}}TB^{-\frac{1}{2}}Y') = \text{tr}(T\_{(\text{l})}Y')$$

Matrix-Variate Gaussian Distribution

where *T(*1*)* <sup>=</sup> *<sup>A</sup>*−<sup>1</sup> <sup>2</sup> *T B*−<sup>1</sup> <sup>2</sup> . Then

$$M\_X(T) = \frac{1}{(2\pi)^{\frac{pq}{2}}} \int\_Y \mathbf{e}^{\text{tr}(T\_{(\mathbf{l})}Y') - \frac{1}{2}\text{tr}(YY')} \mathbf{d}Y.$$

Note that *T(*1*)Y* and *Y Y* are *p*×*p*. Consider −2tr*(T(*1*)Y* - *)*+tr*(Y Y* - *)*, which can be written as

$$-2\text{tr}(T\_{(\text{l})}Y') + \text{tr}(YY') = -\text{tr}(T\_{(\text{l})}T'\_{(\text{l})}) + \text{tr}[(Y - T\_{(\text{l})})(Y - T\_{(\text{l})})'].$$

Therefore

$$M\_X(T) = \mathbf{e}^{\frac{1}{2}\text{tr}(T\_{(1)}T'\_{(1)})} \frac{1}{(2\pi)^{\frac{p\underline{q}}{2}}} \int\_Y \mathbf{e}^{-\frac{1}{2}\text{tr}[(Y - T\_{(1)})(Y' - T'\_{(1)})]} \mathbf{d}Y$$

$$= \mathbf{e}^{\frac{1}{2}\text{tr}(T\_{(1)}T'\_{(1)})} = \mathbf{e}^{\frac{1}{2}\text{tr}(A^{-\frac{1}{2}}TB^{-1}T'A^{-\frac{1}{2}})} = \mathbf{e}^{\frac{1}{2}\text{tr}(A^{-1}TB^{-1}T')} \tag{4.3.1}$$

since the integral is 1 from the total integral of a matrix-variate Gaussian density.

In the presence of a location parameter matrix *M*, the matrix-variate Gaussian density is given by

$$f\_{p,q}(X) = \frac{|A|^{\frac{q}{2}}|B|^{\frac{p}{2}}}{(2\pi)^{\frac{pq}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}(A^{\frac{1}{2}}(X-M)B(X-M)^{\frac{1}{2}})} \tag{4.3.2}$$

where *M* is a constant *p*×*q* matrix. In this case, *T X*- = *T (X*−*M* +*M)*- = *T (X*−*M)*- + *T M*- , and

$$M\_X(T) = M\_f(T) = E[\mathbf{e}^{\text{tr}(TX')}] = \mathbf{e}^{\text{tr}(TM')} E[\mathbf{e}^{\text{tr}(T(X-M)')}]$$

$$= \mathbf{e}^{\text{tr}(TM')} \mathbf{e}^{\frac{1}{2}\text{tr}(A^{-1}TB^{-1}T')} = \mathbf{e}^{\text{tr}(TM') + \text{tr}(\frac{1}{2}A^{-1}TB^{-1}T')}.\tag{4.3.3}$$

When *p* = 1*,* we have the usual *q*-variate multinormal density. In this case, *A* is 1×1 and taken to be 1. Then the mgf is given by

$$M\_X(T) = \mathbf{e}^{TM' + \frac{1}{2}TB^{-1}T'} \tag{4.3.4}$$

where *T, M* and *X* are 1 × *q* and *B>O* is *q* × *q*. The corresponding characteristic function when *p* = 1 is given by

$$\phi(T) = \mathbf{e}^{iTM' - \frac{1}{2}TB^{-1}T'}.\tag{4.3.5}$$

**Example 4.3.1.** Let *X* have a 2×3 real matrix-variate Gaussian density with the following parameters:

$$\begin{aligned} X &= \begin{bmatrix} x\_{11} & x\_{12} & x\_{13} \\ x\_{21} & x\_{22} & x\_{23} \end{bmatrix}, & E[X] &= M = \begin{bmatrix} 1 & 0 & -1 \\ -1 & 1 & 0 \end{bmatrix}, & A &= \begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix}, \\\ B &= \begin{bmatrix} 3 & -1 & 1 \\ -1 & 2 & 1 \\ 1 & 1 & 3 \end{bmatrix}. \end{aligned}$$

Consider the density *f*2*,*3*(X)* with the exponent preceded by <sup>1</sup> <sup>2</sup> to be consistent with *p*variate real Gaussian density. Verify whether *A* and *B* are positive definite. Then compute the moment generating function (mgf) of *X* or that associated with *f*2*,*3*(X)* and write down the exponent explicitly.

**Solution 4.3.1.** Consider a 2×3 parameter matrix *T* = *(tij )*. Let us compute the various quantities in the mgf. First,

$$TM'=\begin{bmatrix}t\_{11} & t\_{12} & t\_{13} \\ t\_{21} & t\_{22} & t\_{23} \end{bmatrix} \begin{bmatrix} 1 & -1 \\ 0 & 1 \\ -1 & 0 \end{bmatrix} = \begin{bmatrix} t\_{11} - t\_{13} & -t\_{11} + t\_{12} \\ t\_{21} - t\_{23} & -t\_{21} + t\_{22} \end{bmatrix},$$

so that

$$\text{tr}(TM') = t\_{\text{l1}} - t\_{\text{l3}} - t\_{\text{21}} + t\_{\text{22}}.\tag{i}$$

Consider the leading minors in *A* and *B*. Note that |*(*1*)*| = 1 *>* 0*,* |*A*| = 1 *>* 0*,* |*(*3*)*| = 3 *>* 0*,* 3 −1 −1 2 <sup>=</sup> <sup>5</sup> *<sup>&</sup>gt;* <sup>0</sup>*,* <sup>|</sup>*B*| = <sup>8</sup> *<sup>&</sup>gt;* 0; thus both *<sup>A</sup>* and *<sup>B</sup>* are positive definite. The inverses of *<sup>A</sup>* and *<sup>B</sup>* are obtained by making use of the formula *<sup>C</sup>*−<sup>1</sup> <sup>=</sup> <sup>1</sup> |*C*| *(*Cof*(C))*- ; they are

$$A^{-1} = \begin{bmatrix} 2 & -1 \\ -1 & 1 \end{bmatrix}, \; B^{-1} = \frac{1}{8} \begin{bmatrix} 5 & 4 & -3 \\ 4 & 8 & -4 \\ -3 & -4 & 5 \end{bmatrix}.$$

For determining the exponent in the mgf, we need *A*−1*T* and *B*−1*T* - *,* which are

$$\begin{aligned} A^{-1}T &= \begin{bmatrix} 2 & -1 \\ -1 & 1 \end{bmatrix} \begin{bmatrix} t\_{11} & t\_{12} & t\_{13} \\ t\_{21} & t\_{22} & t\_{23} \end{bmatrix} \\ &= \begin{bmatrix} 2t\_{11} - t\_{21} & 2t\_{12} - t\_{22} & 2t\_{13} - t\_{23} \\ -t\_{11} + t\_{21} & -t\_{12} + t\_{22} & -t\_{13} + t\_{23} \end{bmatrix} \end{aligned}$$

$$\begin{split} B^{-1}T' &= \frac{1}{8} \begin{bmatrix} 5 & 4 & -3 \\ 4 & 8 & -4 \\ -3 & -4 & 5 \end{bmatrix} \begin{bmatrix} t\_{11} & t\_{21} \\ t\_{12} & t\_{22} \\ t\_{13} & t\_{23} \end{bmatrix} \\ &= \frac{1}{8} \begin{bmatrix} 5t\_{11} + 4t\_{12} - 3t\_{13} & 5t\_{21} + 4t\_{22} - 3t\_{23} \\ 4t\_{11} + 8t\_{12} - 4t\_{13} & 4t\_{21} + 8t\_{22} - 4t\_{23} \\ -3t\_{11} - 4t\_{12} + 5t\_{13} & -3t\_{21} - 4t\_{22} + 5t\_{23} \end{bmatrix}. \end{split}$$

Hence,

$$\begin{split} \frac{1}{2} \text{tr}[A^{-1}TB^{-1}T'] &= \frac{1}{16} [(2t\_{11} - t\_{21})(5t\_{11} + 4t\_{12} - 3t\_{13}) \\ &+ (2t\_{12} - t\_{22})(4t\_{11} + 8t\_{12} - 4t\_{13}) + (2t\_{13} - t\_{23})(-3t\_{11} - 4t\_{12} + 5t\_{13}) \\ &+ (-t\_{11} + t\_{21})(5t\_{21} + 4t\_{22} - 3t\_{23}) + (-t\_{12} + t\_{22})(4t\_{21} + 8t\_{22} - 4t\_{23}) \\ &+ (-t\_{13} + t\_{23})(-3t\_{21} - 4t\_{22} + 5t\_{23})]. \end{split}$$

Thus, the mgf is *MX(T )* <sup>=</sup> <sup>e</sup>*Q(T )* where

$$\mathcal{Q}(T) = \text{tr}(TM') + \frac{1}{2}\text{tr}(A^{-1}TB^{-1}T'),$$

these quantities being given in *(i)* and *(ii)*. This completes the computations.

#### **4.3a. Moment Generating and Characteristic Functions, Complex Case**

Let *X*˜ = *(x*˜*ij )* be a *p* × *q* matrix where the *x*˜*ij* 's are distinct scalar complex variables. We may write *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup> *iX*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*), X*1*, X*<sup>2</sup> being real *<sup>p</sup>* <sup>×</sup> *<sup>q</sup>* matrices. Let *<sup>T</sup>*˜ be a *p* × *q* parameter matrix and *T*˜ = *T*<sup>1</sup> + *iT*2*, T*1*, T*<sup>2</sup> being real *p* × *q* matrices. The conjugate transposes of *X*˜ and *T*˜ are denoted by *X*˜ <sup>∗</sup> and *T*˜ <sup>∗</sup>, respectively. Then,

$$\begin{split} \operatorname{tr}(\bar{T}\bar{X}^\*) &= \operatorname{tr}[(T\_1 + i\,T\_2)(X\_1' - iX\_2')] \\ &= \operatorname{tr}[T\_1X\_1' + T\_2X\_2' + i(T\_2X\_1' - T\_1X\_2')] \\ &= \operatorname{tr}(T\_1X\_1') + \operatorname{tr}(T\_2X\_2') + i\operatorname{tr}(T\_2X\_1' - T\_1X\_2'). \end{split}$$

If *<sup>T</sup>*<sup>1</sup> <sup>=</sup> *(t(*1*) ij ), X*<sup>1</sup> <sup>=</sup> *(x(*1*) ij ), X*<sup>2</sup> <sup>=</sup> *(x(*2*) ij ), T*<sup>2</sup> <sup>=</sup> *(t(*2*) ij ),* tr*(T*1*X*- <sup>1</sup>*)* <sup>=</sup> *<sup>p</sup> i*=1 *<sup>q</sup> <sup>j</sup>*=<sup>1</sup> *<sup>t</sup> (*1*) ij <sup>x</sup>(*1*) ij* , tr*(T*2*X*- <sup>2</sup>*)* <sup>=</sup> *<sup>p</sup> i*=1 *<sup>q</sup> <sup>j</sup>*=<sup>1</sup> *<sup>t</sup> (*2*) ij <sup>x</sup>(*2*) ij* . In other words, tr*(T*1*X*- <sup>1</sup>*)* + tr*(T*2*X*- <sup>2</sup>*)* gives all the *xij* 's in the real and complex parts of *X*˜ multiplied by the corresponding *tij* 's in the real and complex parts of *<sup>T</sup>*˜. That is, *<sup>E</sup>*[etr*(T*1*X*- 1*)*+tr*(T*2*X*- 2*)* ] gives a moment generating function (mgf) associated with the complex matrix-variate Gaussian density that is consistent with real multivariate mgf. However, [tr*(T*1*X*- <sup>1</sup>*)* + tr*(T*2*X*- <sup>2</sup>*)*]=*(*tr[*T*˜*X*˜ ∗]*)*, *(*·*)* denoting the real part of *(*·*)*. Thus, in the complex case, the mgf for any real-valued scalar function *g(X)*˜ of the complex matrix argument *X*˜ , where *g(X)*˜ is a density, is defined as

$$
\tilde{M}\_{\tilde{X}}(\tilde{T}) = \int\_{\tilde{X}} \mathbf{e}^{\Re[\text{tr}(\tilde{T}\tilde{X}^\*)]} \mathbf{g}(\tilde{X}) \mathbf{d}\tilde{X} \tag{4.3a.1}
$$

whenever the expected value exists. On replacing *<sup>T</sup>*˜ by *iT, i* ˜ <sup>=</sup> <sup>√</sup>*(*−1*)*, we obtain the characteristic function of *X*˜ or that associated with *f*˜, denoted by *φX*˜*(T )*˜ = *φf*˜*(T )*˜ . That is,

$$\phi\_{\tilde{X}}(\tilde{T}) = \int\_{\tilde{X}} \mathbf{e}^{\Re[\text{tr}(i\tilde{T}\tilde{X}^\*)]} \mathbf{g}(\tilde{X}) \mathbf{d}\tilde{X}.\tag{4.3a.2}$$

Then, the mgf of the matrix-variate Gaussian density in the complex domain is available by paralleling the derivation in the real case and making use of Lemma 3.2*a*.1:

$$\begin{split} \tilde{M}\_{\tilde{X}}(\tilde{T}) &= E[\mathbf{e}^{\Re[\text{tr}(\tilde{T}\tilde{X}^\*)]}] \\ &= \mathbf{e}^{\Re[\text{tr}(\tilde{T}\tilde{M}^\*)] + \frac{1}{4}\Re[\text{tr}(A^{-\frac{1}{2}}\tilde{T}B^{-1}\tilde{T}^\*A^{-\frac{1}{2}})]} . \end{split} \tag{4.3a.3}$$

The corresponding characteristic function is given by

$$\phi\_{\tilde{X}}(\tilde{T}) = \mathbf{e}^{\Re[\text{tr}(i\tilde{T}\tilde{M}^\*)] - \frac{1}{4}\Re[\text{tr}(A^{-\frac{1}{2}}\tilde{T}B^{-1}\tilde{T}^\*A^{-\frac{1}{2}})]}.\tag{4.3a.4}$$

Note that when *A* = *A*<sup>∗</sup> *> O* and *B* = *B*<sup>∗</sup> *> O* (Hermitian positive definite),

$$(A^{-\frac{1}{2}}\tilde{T}B^{-1}\tilde{T}^\*A^{-\frac{1}{2}})^\* = A^{-\frac{1}{2}}\tilde{T}B^{-1}\tilde{T}^\*A^{-\frac{1}{2}},$$

that is, this matrix is Hermitian. Thus, letting *<sup>U</sup>*˜ <sup>=</sup> *<sup>A</sup>*−<sup>1</sup> <sup>2</sup> *T B*˜ <sup>−</sup>1*T*˜ <sup>∗</sup>*A*−<sup>1</sup> <sup>2</sup> = *U*<sup>1</sup> + *iU*<sup>2</sup> where *U*<sup>1</sup> and *U*<sup>2</sup> are real matrices, *U*<sup>1</sup> = *U*- <sup>1</sup> and *U*<sup>2</sup> = −*U*- <sup>2</sup>*,* that is, *U*<sup>1</sup> and *U*<sup>2</sup> are respectively symmetric and skew symmetric real matrices. Accordingly, tr*(U )*˜ = tr*(U*1*)* + *i*tr*(U*2*)* = tr*(U*1*)* as the trace of a real skew symmetric matrix is zero. Therefore, [tr*(A*−<sup>1</sup> <sup>2</sup> *T B*˜ <sup>−</sup>1*T*˜ <sup>∗</sup>*A*−<sup>1</sup> <sup>2</sup> *)*] = tr*(A*−<sup>1</sup> <sup>2</sup> *T B*˜ <sup>−</sup>1*T*˜ <sup>∗</sup>*A*−<sup>1</sup> <sup>2</sup> *),* the diagonal elements of a Hermitian matrix being real.

When *p* = 1*,* we have the usual *q*-variate complex multivariate normal density and taking the 1 × 1 matrix *A* to be 1*,* the mgf is as follows:

$$\tilde{M}\_{\tilde{X}}(\tilde{T}) = \mathbf{e}^{\Re(\tilde{T}\tilde{M}^\*) + \frac{1}{4}(\tilde{T}B^{-1}\tilde{T}^\*)}\tag{4.3a.5}$$

where *T ,*˜ *M*˜ are 1 × *q* vectors and *B* = *B*<sup>∗</sup> *> O* (Hermitian positive definite), the corresponding characteristic function being given by

$$\phi\_{\tilde{X}}(\tilde{T}) = \mathbf{e}^{\Re(i\tilde{T}\tilde{M}^\*) - \frac{1}{4}(\tilde{T}B^{-1}\tilde{T}^\*)}.\tag{4.3a.6}$$

**Example 4.3a.1.** Consider a 2 × 2 matrix *X*˜ in the complex domain having a complex matrix-variate density with the following parameters:

$$\begin{aligned} \tilde{X} &= \begin{bmatrix} \tilde{x}\_{11} & \tilde{x}\_{12} \\ \tilde{X}\_{21} & \tilde{X}\_{22} \end{bmatrix}, \; E[\tilde{X}] = \tilde{M} = \begin{bmatrix} 1+i & i \\ 2-i & 1 \end{bmatrix}, \\\ A &= \begin{bmatrix} 2 & i \\ -i & 3 \end{bmatrix}, \; B = \begin{bmatrix} 2 & -i \\ i & 1 \end{bmatrix}. \end{aligned}$$

Determine whether *A* and *B* are Hermitian positive definite; then, obtain the mgf of this distribution and provide the exponential part explicitly.

**Solution 4.3a.1.** Clearly, *A>O* and *B>O*. We first determine *A*−1*, B*−1*, A*−1*T ,*˜ *B*−1*T*˜ <sup>∗</sup>:

$$\begin{split} A^{-1} &= \frac{1}{5} \begin{bmatrix} 3 & -i \\ i & 2 \end{bmatrix}, \ A^{-1} \tilde{T} = \frac{1}{5} \begin{bmatrix} 3 & -i \\ i & 2 \end{bmatrix} \begin{bmatrix} \tilde{t}\_{11} & \tilde{t}\_{12} \\ \tilde{t}\_{21} & \tilde{t}\_{22} \end{bmatrix} = \frac{1}{5} \begin{bmatrix} 3\tilde{t}\_{11} - i\tilde{t}\_{21} & 3\tilde{t}\_{12} - i\tilde{t}\_{22} \\ i\tilde{t}\_{11} + 2\tilde{t}\_{21} & i\tilde{t}\_{12} + 2\tilde{t}\_{22} \end{bmatrix}, \\\ B^{-1} &= \begin{bmatrix} 1 & i \\ -i & 2 \end{bmatrix}, \ B^{-1} \tilde{T}^\* = \begin{bmatrix} \tilde{t}\_{11}^\* + i\tilde{t}\_{12}^\* & \tilde{t}\_{21}^\* + i\tilde{t}\_{22}^\* \\ -i\tilde{t}\_{11}^\* + 2\tilde{t}\_{12}^\* & -i\tilde{t}\_{21}^\* + 2\tilde{t}\_{22}^\* \end{bmatrix}. \end{split}$$

Letting *<sup>δ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> tr*(A*−1*T B*˜ <sup>−</sup>1*T*˜ <sup>∗</sup>*),*

$$\begin{split} 10\delta = \{ (3\tilde{t}\_{11} - i\tilde{t}\_{21})(\tilde{t}\_{11}^\* + i\tilde{t}\_{12}^\*) + (3\tilde{t}\_{12} - i\tilde{t}\_{22})(-i\tilde{t}\_{11}^\* + 2\tilde{t}\_{12}^\*) \\ + (i\tilde{t}\_{11} + 2\tilde{t}\_{21})(\tilde{t}\_{21}^\* + i\tilde{t}\_{22}^\*) + (i\tilde{t}\_{12} + 2\tilde{t}\_{22})(-i\tilde{t}\_{21}^\* + 2\tilde{t}\_{22}^\*) \}, \end{split}$$

$$\begin{split} 10\delta = \{3\tilde{t}\_{11}\tilde{t}\_{11}^{\*} + 3i\tilde{t}\_{11}\tilde{t}\_{12}^{\*} - i\tilde{t}\_{21}\tilde{t}\_{11}^{\*} + \tilde{t}\_{21}\tilde{t}\_{12}^{\*} \\ + 6\tilde{t}\_{12}\tilde{t}\_{12}^{\*} - \tilde{t}\_{22}\tilde{t}\_{11}^{\*} - 2i\tilde{t}\_{22}\tilde{t}\_{12}^{\*} - 3i\tilde{t}\_{12}\tilde{t}\_{11}^{\*} \\ + i\tilde{t}\_{11}\tilde{t}\_{21}^{\*} - \tilde{t}\_{11}\tilde{t}\_{22}^{\*} + 2\tilde{t}\_{21}\tilde{t}\_{21}^{\*} + 2i\tilde{t}\_{21}\tilde{t}\_{22}^{\*} \\ + \tilde{t}\_{12}\tilde{t}\_{21}^{\*} + 2i\tilde{t}\_{12}\tilde{t}\_{22}^{\*} - 2i\tilde{t}\_{22}\tilde{t}\_{21}^{\*} + 4\tilde{t}\_{22}\tilde{t}\_{22}^{\*} \}, \end{split}$$

$$\begin{split} 10\delta = 3\tilde{t}\_{11}\tilde{t}\_{11}^{\*} + 6\tilde{t}\_{12}\tilde{t}\_{12}^{\*} + 2\tilde{t}\_{21}\tilde{t}\_{21}^{\*} + 4\tilde{t}\_{22}\tilde{t}\_{22}^{\*} \\ \quad + 3i[\tilde{t}\_{11}\tilde{t}\_{12}^{\*} - \tilde{t}\_{12}\tilde{t}\_{11}^{\*}] - [\tilde{t}\_{22}\tilde{t}\_{11}^{\*} + \tilde{t}\_{11}\tilde{t}\_{22}^{\*}] \\ \quad + i[\tilde{t}\_{11}\tilde{t}\_{21}^{\*} - \tilde{t}\_{11}^{\*}\tilde{t}\_{21}] + [\tilde{t}\_{12}\tilde{t}\_{21}^{\*} + \tilde{t}\_{12}^{\*}\tilde{t}\_{21}] \\ \quad + 2i[\tilde{t}\_{21}\tilde{t}\_{22}^{\*} - \tilde{t}\_{21}^{\*}\tilde{t}\_{22}] + 2i[\tilde{t}\_{12}\tilde{t}\_{22}^{\*} - \tilde{t}\_{12}^{\*}\tilde{t}\_{22}]. \end{split}$$

Letting *t* ˜ *rs* <sup>=</sup> *trs*<sup>1</sup> <sup>+</sup> *itrs*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*), trs*1*, trs*<sup>2</sup> being real, for all *<sup>r</sup>* and *<sup>s</sup>*, then *<sup>δ</sup>*, the exponent in the mgf, can be expressed as follows:

$$\begin{split} \delta = \frac{1}{10} \langle 3(t\_{111}^2 + t\_{112}^2) + 6(t\_{121}^2 + t\_{122}^2) + 2(t\_{211}^2 + t\_{212}^2) + 4(t\_{221}^2 + t\_{222}^2) \\ - 6(t\_{112}t\_{121} - t\_{111}t\_{122}) - 2(t\_{111}t\_{221} - t\_{112}t\_{222}) - 2(t\_{112}t\_{211} - t\_{111}t\_{212}) \\ + 2(t\_{121}t\_{211} + t\_{122}t\_{212}) - 4(t\_{212}t\_{221} - t\_{211}t\_{222}) - 4(t\_{122}t\_{221} - t\_{121}t\_{222})). \end{split}$$

This completes the computations.

#### **4.3.1. Distribution of the exponent, real case**

Let us determine the distribution of the exponent in the *p*×*q* real matrix-variate Gaussian density. Letting *u* = tr*(AXBX*- *)*, its density can be obtained by evaluating its associated mgf. Then, taking *t* as its scalar parameter since *u* is scalar, we have

$$M\_{\boldsymbol{\mu}}(t) = E[\mathbf{e}^{t\boldsymbol{\mu}}] = E[\mathbf{e}^{t\,\mathrm{tr}(AXBX')}].$$

Since this expected value depends on *X*, we can integrate out over the density of *X*:

$$\begin{split} M\_{\mathfrak{u}}(t) &= C \int\_{X} \mathbf{e}^{t \operatorname{tr}(AXBX') - \frac{1}{2} \operatorname{tr}(AXBX')} \mathrm{d}X \\ &= C \int\_{X} \mathbf{e}^{-\frac{1}{2}(1-2t)(\operatorname{tr}(AXBX'))} \mathrm{d}X \text{ for } 1-2t > 0 \end{split} \tag{i} $$

where

$$C = \frac{|A|^{\frac{q}{2}} |B|^{\frac{p}{2}}}{(2\pi)^{\frac{pq}{2}}}.$$

The integral in *(i)* is convergent only when 1−2*t >* 0. Then distributing <sup>√</sup>*(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t)* to each element in *X* and *X*- , and denoting the new matrix by *Xt* , we have *Xt* <sup>=</sup> <sup>√</sup>*(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t)X* <sup>⇒</sup> d*Xt* = *(* <sup>√</sup>*(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t))pq* <sup>d</sup>*<sup>X</sup>* <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*t) pq* <sup>2</sup> d*X*. Integral over *Xt* , together with *C*, yields 1 and hence

$$M\_{\mu}(t) = (1 - 2t)^{-\frac{pq}{2}}, \text{ provided } 1 - 2t > 0. \tag{4.3.6}$$

The corresponding density is a real chi-square having *pq* degrees of freedom or a real gamma density with parameters *<sup>α</sup>* <sup>=</sup> *pq* <sup>2</sup> and *β* = 2. Thus, the resulting density, denoted by *fu*<sup>1</sup> *(u*1*)*, is given by

$$f\_{u\_1}(u\_1) = [2^{\frac{pq}{2}} \Gamma(pq/2)]^{-1} u\_1^{\frac{pq}{2}-1} e^{-\frac{u\_1}{2}}, \ 0 \le u\_1 < \infty, \ p, q = 1, 2, \dots, \qquad (4.3.7)$$

and *fu*<sup>1</sup> *(u*1*)* = 0 elsewhere.

#### **4.3a.1. Distribution of the exponent, complex case**

In the complex case, letting *u*˜ = tr*(A* 1 <sup>2</sup>*XB*˜ *X*˜ <sup>∗</sup>*A* 1 <sup>2</sup> *),* we note that *u*˜ = ˜*u*<sup>∗</sup> and *u*˜ is a scalar, so that *u*˜ is real. Hence, the mgf of *u*˜, with real parameter *t*, is given by

$$M\_{\tilde{\mathbf{u}}}(t) = E[\mathbf{e}^{t \operatorname{tr}(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})}] = C\_1 \int\_{\tilde{X}} \mathbf{e}^{-(1-t) \operatorname{tr}(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})} d\tilde{X}, \ 1 - t > 0, \ \text{with}$$

$$C\_1 = \frac{|\det(A)|^q |\det(B)|^p}{\pi^{p \cdot q}}.$$

On making the transformation *Y*˜ = *A* 1 <sup>2</sup>*XB*˜ <sup>1</sup> <sup>2</sup> , we have

$$M\_{\tilde{\mu}}(t) = \frac{1}{\pi^{p\,q}} \int\_{\tilde{Y}} \mathbf{e}^{-(1-t)\text{tr}(\tilde{Y}\tilde{Y}^\*)} \mathrm{d}\tilde{Y} \,.$$

However,

$$\text{tr}(\tilde{Y}\tilde{Y}^\*) = \sum\_{r=1}^p \sum\_{s=1}^q |\tilde{\mathbf{y}}\_{rs}|^2 = \sum\_{r=1}^p \sum\_{s=1}^q (\mathbf{y}\_{rs1}^2 + \mathbf{y}\_{rs2}^2)^2$$

where *<sup>y</sup>*˜*rs* <sup>=</sup> *yrs*<sup>1</sup> <sup>+</sup> *iyrs*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*), yrs*1*, yrs*<sup>2</sup> being real. Hence

$$\frac{1}{\pi} \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} \mathbf{e}^{-(1-t)(\mathbf{y}\_{rs1}^2 + \mathbf{y}\_{rs2}^2)} \mathbf{d} \mathbf{y}\_{rs1} \wedge \mathbf{d} \mathbf{y}\_{rs2} = \frac{1}{1-t}, \quad 1-t > 0.$$

Therefore,

$$M\_{\tilde{\mu}}(t) = (1 - t)^{-p \cdot q}, \ 1 - t > 0,\tag{4.3a.7}$$

and *u*˜ = *v* has a real gamma density with parameters *α* = *p q, β* = 1, or a chi-square density in the complex domain with *p q* degrees of freedom, that is,

$$f\_v(v) = \frac{1}{\Gamma(p\,q)} v^{p\,q-1} \mathbf{e}^{-v}, \ 0 \le v < \infty,\tag{4.3a.8}$$

and *fv(v)* = 0 elsewhere.

#### **4.3.2. Linear functions in the real case**

Let the *p* × *q* real matrix *X* = *(xij )* of the real scalar random variables *xij* 's have the density in (4.2.2), namely

$$f\_{p,q}(X) = \frac{|A|^{\frac{q}{2}} |B|^{\frac{p}{2}}}{(2\pi)^{\frac{pq}{2}}} \mathrm{e}^{-\frac{1}{2} \text{tr}(A(X-M)B(X-M)')} \tag{4.3.8}$$

for *A > O,B > O*, where *M* is a *p* × *q* location parameter matrix. Let *L*<sup>1</sup> be a *p* × 1 vector of constants. Consider the linear function *Z*<sup>1</sup> = *L*- <sup>1</sup>*X* where *Z*<sup>1</sup> is 1 × *q*. Let *T* be a 1 × *q* parameter vector. Then the mgf of the 1 × *q* vector *Z*<sup>1</sup> is

$$M\_{Z\_1}(T) = E[\mathbf{e}^{(TZ\_1')}] = E[\mathbf{e}^{(TX'L\_1)}] = E[\mathbf{e}^{\mathrm{tr}(TX'L\_1)}]$$

$$= E[\mathbf{e}^{\mathrm{tr}((L\_1T)X')}].\tag{i}$$

This can be evaluated by replacing *T* by *L*1*T* in (4.3.4). Then

$$M\_{Z\_1}(T) = \mathbf{e}^{\text{tr}((L\_1T)M') + \frac{1}{2}\text{tr}(A^{-1}L\_1TB^{-1}(L\_1T)')}$$

$$= \mathbf{e}^{\text{tr}(TM'L\_1) + \frac{1}{2}\text{tr}[(L\_1'A^{-1}L\_1)TB^{-1}T']}.\tag{ii}$$

Since *L*- <sup>1</sup>*A*−1*L*<sup>1</sup> is a scalar,

$$(L\_1'A^{-1}L\_1)TB^{-1}T' = T(L\_1'A^{-1}L\_1)B^{-1}T'.$$

On comparing the resulting expression with the mgf of a *q*-variate real normal distribution, we observe that *Z*<sup>1</sup> is a *q*-variate real Gaussian vector with mean value vector *L*- <sup>1</sup>*M* and covariance matrix [*L*- <sup>1</sup>*A*−1*L*1]*B*−1. Hence the following result:

**Theorem 4.3.1.** *Let the real p* × *q matrix X have the density specified in (4.3.8) and L*<sup>1</sup> *be a p* × 1 *constant vector. Let Z*<sup>1</sup> *be the linear function of X, Z*<sup>1</sup> = *L*- <sup>1</sup>*X. Then Z*1*, which is* 1 × *q, has the mgf given in (ii) and thereby Z*<sup>1</sup> *has a q-variate real Gaussian density with the mean value vector L*- <sup>1</sup>*M and covariance matrix* [*L*- <sup>1</sup>*A*−1*L*1]*B*−1*.*

**Theorem 4.3.2.** *Let L*<sup>2</sup> *be a q* × 1 *constant vector. Consider the linear function Z*<sup>2</sup> = *XL*<sup>2</sup> *where the p* × *q real matrix X has the density specified in (4.3.8). Then Z*2*, which is p* × 1*, is a p-variate real Gaussian vector with mean value vector ML*<sup>2</sup> *and covariance matrix* [*L*- <sup>2</sup>*B*−1*L*2]*A*−1*.*

The proof of Theorem 4.3.2 is parallel to the derivation of that of Theorem 4.3.1. Theorems 4.3.1 and 4.3.2 establish that when the *p* × *q* matrix *X* has a *p* × *q*-variate real Gaussian density with parameters *M, A > O, B > O*, then all linear functions of the form *L*- <sup>1</sup>*X* where *L*<sup>1</sup> is *p* × 1 are *q*-variate real Gaussian and all linear functions of the type *XL*<sup>2</sup> where *L*<sup>2</sup> is *q* × 1 are *p*-variate real Gaussian, the parameters in these Gaussian densities being given in Theorems 4.3.1 and 4.3.2.

By retracing the steps, we can obtain characterizations of the density of the *p* × *q* real matrix *X* through linear transformations. Consider all possible *p* × 1 constant vectors *L*<sup>1</sup> or, equivalently, let *L*<sup>1</sup> be arbitrary. Let *T* be a 1 × *q* parameter vector. Then the *p* × *q* matrix *L*1*T* , denoted by *T(*1*),* contains *pq* free parameters. In this case the mgf in *(ii)* can be written as

$$M(T\_{(\mathbf{l})}) = \mathbf{e}^{\text{tr}(T\_{(\mathbf{l})}M') + \frac{1}{2}\text{tr}(A^{-1}T\_{(\mathbf{l})}B^{-1}T\_{(\mathbf{l})}')},\tag{iiii}$$

which has the same structure of the mgf of a *p* × *q* real matrix-variate Gaussian density as given in (4.3.8), whose the mean value matrix is *M* and parameter matrices are *A>O* and *B>O*. Hence, the following result can be obtained:

**Theorem 4.3.3.** *Let L*<sup>1</sup> *be a constant p* × 1 *vector, X be a p* × *q matrix whose elements are real scalar variables and A>O be p* × *p and B>O be q* × *q constant real positive definite matrices. If for an arbitrary vector L*1*, L*- <sup>1</sup>*X is a q-variate real Gaussian vector as specified in Theorem 4.3.1, then X has a p* × *q real matrix-variate Gaussian density as given in (4.3.8).*

As well, a result parallel to this one follows from Theorem 4.3.2:

**Theorem 4.3.4.** *Let L*<sup>2</sup> *be a q* × 1 *constant vector, X be a p* × *q matrix whose elements are real scalar variables and A>O be p* × *p and B>O be q* × *q constant real positive definite matrices. If for an arbitrary constant vector L*2*, XL*<sup>2</sup> *is a p-variate real Gaussian vector as specified in Theorem 4.3.2, then X is p* × *q real matrix-variate Gaussian distributed as in (4.3.8).*

**Example 4.3.2.** Consider a 2 × 2 matrix-variate real Gaussian density with the parameters

$$A = \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix}, \ B = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}, \ M = \begin{bmatrix} 1 & -1 \\ 0 & 1 \end{bmatrix} = E[X], \ X = \begin{bmatrix} x\_{11} & x\_{12} \\ x\_{21} & x\_{22} \end{bmatrix}.$$

Letting *U*<sup>1</sup> = *L*- <sup>1</sup>*X, U*<sup>2</sup> = *XL*2*, U*<sup>3</sup> = *L*- <sup>1</sup>*XL*2, evaluate the densities of *U*1*, U*2*, U*<sup>3</sup> by applying Theorems 4.3.1 and 4.3.2 where *L*- <sup>1</sup> = [1*,* 1]*, L*- <sup>2</sup> = [1*,* −1]; as well, obtain those densities without resorting to these theorems.

**Solution 4.3.2.** Let us first compute the following quantities:

$$A^{-1}, \; B^{-1}, \; L\_1'A^{-1}L\_1, \; L\_2'B^{-1}L\_2, \; L\_1'M, \; ML\_2, \; L\_1'ML\_2.$$

They are

$$\begin{aligned} A^{-1} &= \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix}, \ B^{-1} = \frac{1}{3} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}, \\\ L\_1'M &= [1,1] \begin{bmatrix} 1 & -1 \\ 0 & 1 \end{bmatrix} = [1,0], \ dL L\_2 = \begin{bmatrix} 2 \\ -1 \end{bmatrix}, \\\ L\_1'A^{-1}L\_1 &= [1,1] \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} = 1, \ L\_2'B^{-1}L\_2 = \frac{1}{3}[1,-1] \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = 2, \\\ L\_1'ML\_2 &= [1,0] \begin{bmatrix} 2 \\ -1 \end{bmatrix} = 2. \end{aligned}$$

Let *U*<sup>1</sup> = *L*- <sup>1</sup>*X, U*<sup>2</sup> = *XL*2*, U*<sup>3</sup> = *L*- <sup>1</sup>*XL*2. Then by making use of Theorems 4.3.1 and 4.3.2 and then, results from Chap. 2 on *q*-variate real Gaussian vectors, we have the following:

$$U\_1 \sim N\_2((1,0),(1)B^{-1}),\ U\_2 \sim N\_2(ML\_2, 2A^{-1}),\ U\_3 \sim N\_1(1,(1)(2)) = N\_1(1,2).$$

Let us evaluate the densities without resorting to these theorems. Note that *U*<sup>1</sup> = [*x*<sup>11</sup> + *x*21*, x*<sup>12</sup> + *x*22]. Then *U*<sup>1</sup> has a bivariate real distribution. Let us compute the mgf of *U*1. Letting *t*<sup>1</sup> and *t*<sup>2</sup> be real parameters, the mgf of *U*<sup>1</sup> is

$$M\_{U\_1}(t\_1, t\_2) = E[\mathbf{e}^{t\_1(\mathbf{x}\_{11} + \mathbf{x}\_{21}) + t\_2(\mathbf{x}\_{12} + \mathbf{x}\_{22})}] = E[\mathbf{e}^{t\_1\mathbf{x}\_{11} + t\_1\mathbf{x}\_{21} + t\_2\mathbf{x}\_{12} + t\_2\mathbf{x}\_{22}}],$$

which is available from the mgf of *X* by letting *t*<sup>11</sup> = *t*1*, t*<sup>21</sup> = *t*1*, t*<sup>12</sup> = *t*2*, t*<sup>22</sup> = *t*2. Thus,

$$\begin{aligned} A^{-1}T &= \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} t\_1 & t\_2 \\ t\_1 & t\_2 \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ t\_1 & t\_2 \end{bmatrix} \\\ B^{-1}T' &= \frac{1}{3} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} t\_1 & t\_1 \\ t\_2 & t\_2 \end{bmatrix} = \frac{1}{3} \begin{bmatrix} 2t\_1 - t\_2 & 2t\_1 - t\_2 \\ -t\_1 + 2t\_2 & -t\_1 + 2t\_2 \end{bmatrix}, \end{aligned}$$

so that

$$\frac{1}{2}\text{tr}(A^{-1}TB^{-1}T') = \frac{1}{2}\left\{\frac{1}{3}[2t\_1^2 + 2t\_2^2 - 2t\_1t\_2]\right\} = \frac{1}{2}[t\_1, t\_2]B^{-1}\begin{bmatrix}t\_1\\t\_2\end{bmatrix}.\tag{i}$$

Since

$$U\_2 = X L\_2 = \begin{bmatrix} x\_{11} - x\_{12} \\ x\_{21} - x\_{22} \end{bmatrix},$$

we let *t*<sup>11</sup> = *t*1*, t*<sup>12</sup> = −*t*1*, t*<sup>21</sup> = *t*2*, t*<sup>22</sup> = −*t*2. With these substitutions, we have the following:

$$\begin{aligned} A^{-1}T &= \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} t\_1 & -t\_1 \\ t\_2 & -t\_2 \end{bmatrix} = \begin{bmatrix} t\_1 - t\_2 & -t\_1 + t\_2 \\ -t\_1 + 2t\_2 & t\_1 - 2t\_2 \end{bmatrix}, \\\ B^{-1}T' &= \frac{1}{3} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} t\_1 & t\_2 \\ -t\_1 & -t\_2 \end{bmatrix} = \begin{bmatrix} t\_1 & t\_2 \\ -t\_1 & -t\_2 \end{bmatrix}. \end{aligned}$$

Hence,

$$\begin{aligned} \text{tr}(A^{-1}TB^{-1}T') &= t\_1(t\_1 - t\_2) - t\_1(-t\_1 + t\_2) + t\_2(-t\_1 + 2t\_2) - t\_2(t\_1 - 2t\_2) \\ &= 2[t\_1, t\_2] \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} t\_1 \\ t\_2 \end{bmatrix}. \end{aligned}$$

Therefore, *U*<sup>2</sup> is a 2-variate real Gaussian with covariance matrix 2*A*−<sup>1</sup> and mean value vector 2 −1 . That is, *<sup>U</sup>*<sup>2</sup> <sup>∼</sup> *<sup>N</sup>*2*(ML*2*,* <sup>2</sup>*A*−1*)*. For determining the distribution of *<sup>U</sup>*3, observe that *L*- <sup>1</sup>*XL*<sup>2</sup> = *L*- <sup>1</sup>*U*2. Then, *L*- <sup>1</sup>*U*<sup>2</sup> is univariate real Gaussian with mean value *E*[*L*- <sup>1</sup>*U*2] = *L*- <sup>1</sup>*ML*<sup>2</sup> = [1*,* 1] 2 −1 = 1 and variance *L*- <sup>1</sup>Cov*(U*2*)L*<sup>1</sup> = *L*- <sup>1</sup>2*A*−1*L*<sup>1</sup> <sup>=</sup> 2. That is, *U*<sup>3</sup> = *u*<sup>3</sup> ∼ *N*1*(*1*,* 2*)*. This completes the solution.

The results stated in Theorems 4.3.1 and 4.3.2 are now generalized by taking sets of linear functions of *X*:

**Theorem 4.3.5.** *Let C be a r* × *p, r* ≤ *p, real constant matrix of full rank r and G be a q* × *s matrix, s* ≤ *q, of rank s. Let Z* = *C*- *X and W* = *XG where X has the density specified in (4.3.8). Then, Z has a r* × *q real matrix-variate Gaussian density with M replaced by C*- *M and A*−<sup>1</sup> *replaced by C*- *A*−1*C, B*−<sup>1</sup> *remaining unchanged, and <sup>W</sup>* <sup>=</sup> *XG has a <sup>p</sup>* <sup>×</sup> *<sup>s</sup> real matrix-variate Gaussian distribution with <sup>B</sup>*−<sup>1</sup> *replaced by G*- *B*−1*G and M replaced by MG, A*−<sup>1</sup> *remaining unchanged.*

**Example 4.3.3.** Let the 2 × 2 real *X* = *(xij )* have a real matrix-variate Gaussian distribution with the parameters *M,A* and *B*. Consider the set of linear functions *U* = *C*- *X* where

$$M = \begin{bmatrix} 2 & -1 \\ 1 & 1 \end{bmatrix},\ A = \begin{bmatrix} 2 & -1 \\ -1 & 1 \end{bmatrix},\ B = \begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix},\ C' = \begin{bmatrix} \sqrt{2} & -\frac{1}{\sqrt{2}} \\ 0 & \frac{1}{\sqrt{2}} \end{bmatrix}.$$

Show that the rows of *U* are independently distributed real *q*-variate Gaussian vectors with common covariance matrix *B*−<sup>1</sup> and the rows of *M* as the mean value vectors.

**Solution 4.3.3.** Let us compute *A*−<sup>1</sup> and *C*- *A*−1*C*:

$$\begin{aligned} A^{-1} &= \begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix} \\ C'A^{-1}C &= \begin{bmatrix} \sqrt{2} & -\frac{1}{\sqrt{2}} \\ 0 & \frac{1}{\sqrt{2}} \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} \sqrt{2} & 0 \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I\_2. \end{aligned}$$

In the mgf of *U* = *C*- *X*, *A*−<sup>1</sup> is replaced by *C*- *<sup>A</sup>*−1*<sup>C</sup>* <sup>=</sup> *<sup>I</sup>*<sup>2</sup> and *<sup>B</sup>*−<sup>1</sup> remains the same. Then, the exponent in the mgf of *U*, excluding tr*(T M*- *)* is <sup>1</sup> <sup>2</sup> tr*(T B*−1*<sup>T</sup>* - *)* <sup>=</sup> <sup>1</sup> 2 *<sup>p</sup> <sup>j</sup>*=<sup>1</sup> *TjB*−1*<sup>T</sup>* - *<sup>j</sup>* where *Tj* is the *j* -th row of *T* . Hence the *p* rows of *U* are independently distributed *q*-variate real Gaussian with the common covariance matrix *B*−1. This completes the computations.

The previous example entails a general result that now is stated as a corollary.

**Corollary 4.3.1.** *Let X be a p*×*q-variate real Gaussian matrix with the usual parameters M, A and B, whose density is as given in (4.3.8). Consider the set of linear functions U* = *C*- *X where C is a p* × *p constant matrix of full rank p and C is such that A* = *CC*- *. Then C*- *<sup>A</sup>*−1*<sup>C</sup>* <sup>=</sup> *<sup>C</sup>*- *(CC*- *)*−1*<sup>C</sup>* <sup>=</sup> *<sup>C</sup>*- *(C*- *)*−1*C*−1*<sup>C</sup>* <sup>=</sup> *Ip. Consequently, the rows of <sup>U</sup>, denoted by U*1*,...,Up, are independently distributed as real q-variate Gaussian vectors having the common covariance matrix B*−1*.*

It is easy to construct such a *C*. Since *A* = *(aij )* is real positive definite, set it as *A* = *CC* where *C* is a lower triangular matrix with positive diagonal elements. The first row, first column element in *<sup>C</sup>* <sup>=</sup> *(cij )* is *<sup>c</sup>*<sup>11</sup> = +√*<sup>a</sup>*11. Note that since *A>O*, all the diagonal elements are real positive. The first column of *C* is readily available from the first column of *A* and *c*11. Now, given *a*<sup>22</sup> and the first column in *C*, *c*<sup>22</sup> can be determined, and so on.

**Theorem 4.3.6.** *Let C, G and X be as defined in Theorem 4.3.5. Consider the r* × *s real matrix Z* = *C*- *XG. Then, when X has the distribution specified in (4.3.8), Z has an r* × *s real matrix-variate Gaussian density with M replaced by C*- *MG, A*−<sup>1</sup> *replaced by C*- *A*−1*C and B*−<sup>1</sup> *replaced by G*- *B*−1*G.*

**Example 4.3.4.** Let the 2 × 2 matrix *X* = *(xij )* have a real matrix-variate Gaussian density with the parameters *M,A* and *B*, and consider the set of linear functions *Z* = *C*- *XG* where *C* is a *p* × *p* constant matrix of rank *p* and *G* is a *q* × *q* constant matrix of rank *q*, where

$$\begin{aligned} \mathcal{M} &= \begin{bmatrix} 2 & -1 \\ 1 & 5 \end{bmatrix}, \; A = \begin{bmatrix} 2 & -1 \\ -1 & 1 \end{bmatrix}, \; B = \begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix}, \\\ C' &= \begin{bmatrix} \sqrt{2} & -\frac{1}{\sqrt{2}} \\ 0 & \frac{1}{\sqrt{2}} \end{bmatrix}, \; G = \begin{bmatrix} \sqrt{2} & 0 \\ \frac{1}{\sqrt{2}} & \sqrt{\frac{5}{2}} \end{bmatrix}. \end{aligned}$$

Show that all the elements *zij* 's in *Z* = *(zij )* are mutually independently distributed real scalar standard Gaussian random variables when *M* = *O*.

**Solution 4.3.4.** We have already shown in Example 4.3.3 that *C*- *<sup>A</sup>*−1*<sup>C</sup>* <sup>=</sup> *<sup>I</sup>* . Let us verify that *GG*- = *B* and compute *G*- *B*−1*G*:

$$GG' = \begin{bmatrix} \sqrt{2} & 0\\ \frac{1}{\sqrt{2}} & \sqrt{\frac{5}{2}} \end{bmatrix} \begin{bmatrix} \sqrt{2} & \frac{1}{\sqrt{2}}\\ 0 & \sqrt{\frac{5}{2}} \end{bmatrix} = \begin{bmatrix} 2 & 1\\ 1 & 3 \end{bmatrix} = B;$$

$$\begin{split} B^{-1} &= \frac{1}{5} \begin{bmatrix} 3 & -1 \\ -1 & 2 \end{bmatrix}, \\ G^{'}B^{-1}G &= \frac{1}{5} \begin{bmatrix} \sqrt{2} & \frac{1}{\sqrt{2}} \\ 0 & \sqrt{\frac{5}{2}} \end{bmatrix} \begin{bmatrix} 3 & -1 \\ -1 & 2 \end{bmatrix} G = \frac{1}{5} \begin{bmatrix} 3\sqrt{2} - \frac{1}{\sqrt{2}} & 0 \\ -\sqrt{\frac{5}{2}} & 2\sqrt{\frac{5}{2}} \end{bmatrix} G, \\ &= \frac{1}{5} \begin{bmatrix} 3\sqrt{2} - \frac{1}{\sqrt{2}} & 0 \\ -\sqrt{\frac{5}{2}} & 2\sqrt{\frac{5}{2}} \end{bmatrix} \begin{bmatrix} \sqrt{2} & 0 \\ \frac{1}{\sqrt{2}} & \sqrt{\frac{5}{2}} \end{bmatrix} = \frac{1}{5} \begin{bmatrix} 5 & 0 \\ 0 & 5 \end{bmatrix} = I\_{2}. \end{split}$$

Thus, *A*−<sup>1</sup> is replaced by *C*- *<sup>A</sup>*−1*<sup>C</sup>* <sup>=</sup> *<sup>I</sup>*<sup>2</sup> and *<sup>B</sup>*−<sup>1</sup> is replaced by *<sup>G</sup>*- *<sup>B</sup>*−1*<sup>G</sup>* <sup>=</sup> *<sup>I</sup>*<sup>2</sup> in the mgf of *Z*, so that the exponent in the mgf, excluding tr*(T M*- *)*, is <sup>1</sup> <sup>2</sup> tr*(T T* - *)*. It follows that all the elements in *Z* = *C*- *XG* are mutually independently distributed real scalar standard normal variables. This completes the computations.

The previous example also suggests the following results which are stated as corollaries:

**Corollary 4.3.2.** *Let the p* × *q real matrix X* = *(xij ) have a real matrix-variate Gaussian density with parameters M, A and B, as given in (4.3.8). Consider the set of linear functions Y* = *XG where G is a q* × *q constant matrix of full rank q, and let B* = *GG*- *. Then, the columns of Y , denoted by Y(*1*),...,Y(q), are independently distributed p-variate real Gaussian vectors with common covariance matrix A*−<sup>1</sup> *and mean value (MG)(j ), j* = 1*,...,q, where (MG)(j ) is the j -th column of MG.*

**Corollary 4.3.3.** *Let Z* = *C*- *XG where C is a p* × *p constant matrix of rank p, G is a q* × *q constant matrix of rank q and X is a real p* × *q Gaussian matrix whose parameters are M* = *O, A and B, the constant matrices C and G being such that A* = *CC and B* = *GG*- *. Then, all the elements zij in Z* = *(zij ) are mutually independently distributed real scalar standard Gaussian random variables.*

#### **4.3a.2. Linear functions in the complex case**

We can similarly obtain results parallel to Theorems 4.3.1–4.3.6 in the complex case. Let *X*˜ be *p* × *q* matrix in the complex domain, whose elements are scalar complex variables. Assume that *X*˜ has a complex *p* × *q* matrix-variate density as specified in (4.2*a*.9) whose associated moment generating function is as given in (4.3*a*.3). Let *C*˜<sup>1</sup> be a *p* × 1 constant vector, *C*˜<sup>2</sup> be a *q* × 1 constant vector, *C*˜ be a *r* × *p*, *r* ≤ *p*, constant matrix of rank *r* and *G*˜ be a *q* × *s, s* ≤ *q,* a constant matrix of rank *s*. Then, we have the following results:

**Theorem 4.3a.1.** *Let C*˜<sup>1</sup> *be a p* × 1 *constant vector as defined above and let the p* × *q matrix X*˜ *have the density given in (4.2a.9) whose associated mgf is as specified in (4.3a.3). Let U*˜ *be the linear function of X*˜ *, U*˜ = *C*˜ <sup>∗</sup> <sup>1</sup>*X*˜ *. Then U*˜ *has a q-variate complex Gaussian density with the mean value vector C*˜ <sup>∗</sup> <sup>1</sup>*M*˜ *and covariance matrix* [*C*˜ <sup>∗</sup> <sup>1</sup>*A*−1*C*˜1]*B*−1*.*

**Theorem 4.3a.2.** *Let C*˜<sup>2</sup> *be a q* × 1 *constant vector. Consider the linear function Y*˜ = *X*˜ *C*˜<sup>2</sup> *where the p* × *q complex matrix X*˜ *has the density (4.2a.9). Then Y*˜ *is a p-variate complex Gaussian random vector with the mean value vector M*˜ *C*˜<sup>2</sup> *and the covariance matrix* [*C*˜ <sup>∗</sup> <sup>2</sup>*B*−1*C*˜2]*A*−1*.*

**Note 4.3a.1.** Consider the mgf's of *U*˜ and *Y*˜ in Theorems 4.3a.1 and 4.3a.2, namely *MU*˜ *(T )*˜ <sup>=</sup> *<sup>E</sup>*[e*(T*˜ *<sup>U</sup>*˜ <sup>∗</sup>*)* ] and *MY*˜*(T )*˜ <sup>=</sup> *<sup>E</sup>*[e*(Y*˜ <sup>∗</sup>*T )*˜ ] with the conjugate transpose of the variable part in the linear form in the exponent; then *T*˜ in *MU*˜ *(T )*˜ has to be 1 × *q* and *T*˜ in *MY*˜*(T )*˜ has to be *p* × 1. Thus, the exponent in Theorem 4.3a.1 will be of the form [*C*˜ <sup>∗</sup> <sup>1</sup>*A*−1*C*˜1]*T B*˜ <sup>−</sup>1*T*˜ <sup>∗</sup> whereas the corresponding exponent in Theorem 4.3a.2 will be [*C*˜ <sup>∗</sup> <sup>2</sup>*B*−1*C*˜2]*T*˜ <sup>∗</sup>*A*−1*T*˜. Note that in one case, we have *T B*˜ <sup>−</sup>1*T*˜ <sup>∗</sup> and in the other case, *<sup>T</sup>*˜ and *T*˜ <sup>∗</sup> are interchanged as are *A* and *B*. This has to be kept in mind when applying these theorems.

**Example 4.3a.2.** Consider a 2 × 2 matrix *X*˜ having a complex 2 × 2 matrix-variate Gaussian density with the parameters *M* = *O, A* and *B,* as well as the 2 × 1 vectors *L*<sup>1</sup> and *L*<sup>2</sup> and the linear functions *L*<sup>∗</sup> <sup>1</sup>*X*˜ and *XL*˜ <sup>2</sup> where

Matrix-Variate Gaussian Distribution

$$A = \begin{bmatrix} 2 & -i \\ i & 1 \end{bmatrix}, \ B = \begin{bmatrix} 1 & i \\ -i & 2 \end{bmatrix}, \ L\_1 = \begin{bmatrix} -2i \\ 3i \end{bmatrix}, \ L\_2 = \begin{bmatrix} -i \\ -2i \end{bmatrix} \text{ and } \tilde{X} = \begin{bmatrix} \tilde{x}\_{11} & \tilde{x}\_{12} \\ \tilde{x}\_{21} & \tilde{x}\_{22} \end{bmatrix}.$$

Evaluate the densities of *U*˜ = *L*<sup>∗</sup> <sup>1</sup>*X*˜ and *Y*˜ = *XL*˜ <sup>2</sup> by applying Theorems 4.3a.1 and 4.3a.2, as well as independently.

**Solution 4.3a.2.** First, we compute the following quantities:

$$\begin{aligned} A^{-1} &= \begin{bmatrix} 1 & i \\ -i & 2 \end{bmatrix}, & B^{-1} &= \begin{bmatrix} 2 & -i \\ i & 1 \end{bmatrix}, \\\\ L\_1^\* &= [2i, -3i], & L\_2^\* &= [i, 2i], \end{aligned}$$

so that

$$\begin{aligned} L\_1^\* A^{-1} L\_1 &= [2i, -3i] \begin{bmatrix} 1 & i \\ -i & 2 \end{bmatrix} \begin{bmatrix} -2i \\ 3i \end{bmatrix} = 22 \\ L\_2^\* B^{-1} L\_2 &= [i, 2i] \begin{bmatrix} 2 & -i \\ i & 1 \end{bmatrix} \begin{bmatrix} -i \\ -2i \end{bmatrix} = 6. \end{aligned}$$

Then, as per Theorems 4.3a.1 and 4.3a.2, *U*˜ is a *q*-variate complex Gaussian vector whose covariance matrix is 22*B*−<sup>1</sup> and *Y*˜ is a *p*-variate complex Gaussian vector whose covariance matrix is 6 *<sup>A</sup>*−1, that is, *<sup>U</sup>*˜ <sup>∼</sup> *<sup>N</sup>*˜2*(O,* <sup>22</sup>*B*−1*), <sup>Y</sup>*˜ <sup>∼</sup> *<sup>N</sup>*˜2*(O,* <sup>6</sup> *<sup>A</sup>*−1*)*. Now, let us determine the densities of *U*˜ and *Y*˜ without resorting to these theorems. Consider the mgf of *U*˜ by taking the parameter vector *T*˜ as *T*˜ = [*t* ˜ <sup>1</sup>*, t* ˜ <sup>2</sup>]. Note that

$$
\tilde{T}\tilde{U}^\* = t\_1(-2i\tilde{\mathbf{x}}\_{11}^\* + 3i\tilde{\mathbf{x}}\_{21}^\*) + t\_2(-2i\tilde{\mathbf{x}}\_{12}^\* + 3i\tilde{\mathbf{x}}\_{22}^\*).\tag{i}
$$

Then, in comparison with the corresponding part in the mgf of *X*˜ whose associated general parameter matrix is *T*˜ = *(t* ˜*ij )*, we have

$$
\tilde{t}\_{11} = -2i\tilde{t}\_1,\ \tilde{t}\_{12} = -2i\tilde{t}\_2,\ \tilde{t}\_{21} = 3i\tilde{t}\_1,\ \tilde{t}\_{22} = 3i\tilde{t}\_2.\tag{ii}
$$

We now substitute the values of *(ii)* in the general mgf of *X*˜ to obtain the mgf of *U*˜ . Thus,

$$\begin{split} A^{-1}\tilde{T} &= \begin{bmatrix} 1 & i \\ -i & 2 \end{bmatrix} \begin{bmatrix} -2i\tilde{t}\_1 & -2i\tilde{t}\_2 \\ 3i\tilde{t}\_1 & 3i\tilde{t}\_2 \end{bmatrix} = \begin{bmatrix} (-3-2i)\tilde{t}\_1 & (-3-2i)\tilde{t}\_2 \\ (-2+6i)\tilde{t}\_1 & (-2+6i)\tilde{t}\_2 \end{bmatrix} \\ B^{-1}\tilde{T}^\* &= \begin{bmatrix} 2 & -i \\ i & 1 \end{bmatrix} \begin{bmatrix} 2i\tilde{t}\_1^\* & -3i\tilde{t}\_1^\* \\ 2i\tilde{t}\_2^\* & -3i\tilde{t}\_2^\* \end{bmatrix} = \begin{bmatrix} 4i\tilde{t}\_1^\* + 2\tilde{t}\_2^\* & -6i\tilde{t}\_1^\* - 3\tilde{t}\_2^\* \\ -2\tilde{t}\_1^\* + 2i\tilde{t}\_2^\* & 3\tilde{t}\_1^\* - 3i\tilde{t}\_2^\* \end{bmatrix}. \end{split}$$

Here an asterisk only denotes the conjugate as the quantities are scalar.

$$\begin{split} \text{tr}[A^{-1}\tilde{T}B^{-1}\tilde{T}^\*] &= [-3-2i]\tilde{t}\_1[4i\tilde{t}\_1^\* + 2\tilde{t}\_2^\*] + [(-3-2i)\tilde{t}\_2[-2\tilde{t}\_1^\* + 2i\tilde{t}\_2^\*] \\ &+ [(-2+6i)\tilde{t}\_1][(-6i\tilde{t}\_1^\* - 3\tilde{t}\_2^\*) + [(-2+6i)\tilde{t}\_2[3\tilde{t}\_1^\* - 3i\tilde{t}\_2^\*] \\ &= 22\left[2\tilde{t}\_1\tilde{t}\_1^\* - i\tilde{t}\_1\tilde{t}\_2^\* + i\tilde{t}\_2\tilde{t}\_1^\* + \tilde{t}\_2\tilde{t}\_2^\*\right] \\ &= 22\left[\tilde{t}\_1, \tilde{t}\_2\right] \begin{bmatrix} 2 & -i \\ i & 1 \end{bmatrix} \begin{bmatrix} \tilde{t}\_1^\* \\ \tilde{t}\_2^\* \end{bmatrix} = 22\,\tilde{T}B^{-1}\tilde{T}^\*, \ \tilde{T} = [\tilde{t}\_1, \tilde{t}\_2]. \end{split}$$

This shows that *<sup>U</sup>*˜ <sup>∼</sup> *<sup>N</sup>*˜2*(O,* <sup>22</sup>*B*−1*)*. Now, consider

$$
\tilde{Y} = \tilde{X}L\_2 = \begin{bmatrix} \tilde{x}\_{11} & \tilde{x}\_{12} \\ \tilde{x}\_{21} & \tilde{x}\_{22} \end{bmatrix} \begin{bmatrix} -i \\ -2i \end{bmatrix} = \begin{bmatrix} -i\tilde{x}\_{11} - 2i\tilde{x}\_{12} \\ -i\tilde{x}\_{21} - 2i\tilde{x}\_{22} \end{bmatrix}.
$$

Then, on comparing the mgf of *Y*˜ with that of *X*˜ whose general parameter matrix is *T*˜ = *(t* ˜*ij )*, we have

$$
\tilde{t}\_{11} = i\tilde{t}\_1,\ \tilde{t}\_{12} = 2i\tilde{t}\_1,\ \tilde{t}\_{21} = i\tilde{t}\_2,\ \tilde{t}\_{22} = 2i\tilde{t}\_2.
$$

On substituting these values in the mgf of *X*˜ , we have

$$\begin{aligned} A^{-1}\tilde{T} &= \begin{bmatrix} 1 & i \\ -i & 2 \end{bmatrix} \begin{bmatrix} i\tilde{t}\_1 & 2i\tilde{t}\_1 \\ i\tilde{t}\_2 & 2i\tilde{t}\_2 \end{bmatrix} = \begin{bmatrix} i\tilde{t}\_1 - \tilde{t}\_2 & 2i\tilde{t}\_1 - 2\tilde{t}\_2 \\ \tilde{t}\_1 + 2i\tilde{t}\_2 & 2\tilde{t}\_1 + 4i\tilde{t}\_2 \end{bmatrix} \\ B^{-1}\tilde{T}^\* &= \begin{bmatrix} 2 & -i \\ i & 1 \end{bmatrix} \begin{bmatrix} -i\tilde{t}\_1^\* & -i\tilde{t}\_2^\* \\ -2i\tilde{t}\_1^\* & -2i\tilde{t}\_2^\* \end{bmatrix} = \begin{bmatrix} (-2 - 2i)\tilde{t}\_1^\* & (-2 - 2i)\tilde{t}\_2^\* \\ (1 - 2i)\tilde{t}\_1^\* & (1 - 2i)\tilde{t}\_2^\* \end{bmatrix}, \end{aligned}$$

so that

$$\begin{aligned} \text{tr}[A^{-1}\tilde{T}B^{-1}\tilde{T}^\*] &= [(\tilde{t}\_1 - \tilde{t}\_2)][(-2 - 2i)\tilde{t}\_1^\*] + [2i\tilde{t}\_1 - 2\tilde{t}\_2][(1 - 2i)\tilde{t}\_1^\*] \\ &+ [\tilde{t}\_1 + 2i\tilde{t}\_2][(-2 - 2i)\tilde{t}\_2^\*] + [2\tilde{t}\_1 + 4i\tilde{t}\_2][(1 - 2i)\tilde{t}\_2^\*] \\ &= 6\left[\tilde{t}\_1\tilde{t}\_1^\* - i\tilde{t}\_1\tilde{t}\_2^\* + i\tilde{t}\_2\tilde{t}\_1^\* + 2\tilde{t}\_2\tilde{t}\_2^\*\right] \\ &= 6\left[\tilde{t}\_1^\*, \tilde{t}\_2^\*\right] \begin{bmatrix} 1 & i \\ -i & 2 \end{bmatrix} \begin{bmatrix} \tilde{t}\_1 \\ \tilde{t}\_2 \end{bmatrix} = 6\tilde{T}^\*A^{-1}\tilde{T}; \end{aligned}$$

refer to Note 4.3a.1 regarding the representation of the quadratic forms in the two cases above. This shows that *<sup>Y</sup>*˜ <sup>∼</sup> *<sup>N</sup>*˜2*(O,* <sup>6</sup>*A*−1*)*. This completes the computations.

**Theorem 4.3a.3.** *Let C*˜<sup>1</sup> *be a constant p*×1 *vector, X*˜ *be a p*×*q matrix whose elements are complex scalar variables and let A* = *A*<sup>∗</sup> *> O be p* × *p and B* = *B*<sup>∗</sup> *> O be q* × *q constant Hermitian positive definite matrices, where an asterisk denotes the conjugate* *transpose. If, for an arbitrary p* × 1 *constant vector C*˜1*, C*˜ <sup>∗</sup> <sup>1</sup>*X*˜ *is a q-variate complex Gaussian vector as specified in Theorem 4.3a.1, then X*˜ *has the p* × *q complex matrixvariate Gaussian density given in (4.2a.9).*

As well, a result parallel to this one follows from Theorem 4.3a.2:

**Theorem 4.3a.4.** *Let C*˜<sup>2</sup> *be a q* ×1 *constant vector, X*˜ *be a p* ×*q matrix whose elements are complex scalar variables and let A>O be p* × *p and B>O be q* × *q Hermitian positive definite constant matrices. If, for an arbitrary constant vector C*˜2*, X*˜ *C*˜<sup>2</sup> *is a pvariate complex Gaussian vector as specified in Theorem 4.3a.2, then X*˜ *is p* × *q complex matrix-variate Gaussian matrix which is distributed as in (4.2a.9).*

**Theorem 4.3a.5.** *Let C*˜ <sup>∗</sup> *be a r* × *p, r* ≤ *p, complex constant matrix of full rank r and G*˜ *be a q* × *s, s* ≤ *q, constant complex matrix of full rank s. Let U*˜ = *C*˜ <sup>∗</sup>*X*˜ *and W*˜ = *X*˜ *G*˜ *where X*˜ *has the density given in (4.2a.9). Then, U*˜ *has a r* × *q complex matrix-variate density with M*˜ *replaced by C*˜ <sup>∗</sup>*M*˜ *, A*−<sup>1</sup> *replaced by C*˜ <sup>∗</sup>*A*−1*C*˜ *and B*−<sup>1</sup> *remaining the same, and <sup>W</sup>*˜ *has a <sup>p</sup>* <sup>×</sup> *<sup>s</sup> complex matrix-variate distribution with <sup>B</sup>*−<sup>1</sup> *replaced by <sup>G</sup>*˜ <sup>∗</sup>*B*−1*G*˜ *, M*˜ *replaced by M*˜ *G*˜ *and A*−<sup>1</sup> *remaining the same.*

**Theorem 4.3a.6.** *Let C*˜ <sup>∗</sup>*, G*˜ *and X*˜ *be as defined in Theorem 4.3a.5. Consider the r* × *s complex matrix Z*˜ = *C*˜ <sup>∗</sup>*X*˜ *G*˜ *. Then when X*˜ *has the distribution specified by (4.2a.9), Z*˜ *has an <sup>r</sup>* <sup>×</sup> *<sup>s</sup> complex matrix-variate density with <sup>M</sup>*˜ *replaced by <sup>C</sup>*˜ <sup>∗</sup>*M*˜ *<sup>G</sup>*˜ *, <sup>A</sup>*−<sup>1</sup> *replaced by C*˜ <sup>∗</sup>*A*−1*C*˜ *and B*−<sup>1</sup> *replaced by G*˜ <sup>∗</sup>*B*−1*G*˜ *.*

**Example 4.3a.3.** Consider a 2 × 3 matrix *X*˜ having a complex matrix-variate Gaussian distribution with the parameter matrices *M*˜ = *O, A* and *B* where

$$A = \begin{bmatrix} 2 & i \\ -i & 1 \end{bmatrix}, \ B = \begin{bmatrix} 3 & i & 0 \\ -i & 1 & i \\ 0 & -i & 2 \end{bmatrix}, \ \tilde{X} = \begin{bmatrix} \tilde{\chi}\_{11} & \tilde{\chi}\_{12} & \tilde{\chi}\_{13} \\ \tilde{\chi}\_{21} & \tilde{\chi}\_{22} & \tilde{\chi}\_{23} \end{bmatrix}.$$

Consider the linear forms

$$\begin{split} C^\* \tilde{X} &= \begin{bmatrix} i\tilde{\boldsymbol{x}}\_{11} - i\tilde{\boldsymbol{x}}\_{21} & i\tilde{\boldsymbol{x}}\_{12} - i\tilde{\boldsymbol{x}}\_{22} & i\tilde{\boldsymbol{x}}\_{13} - i\tilde{\boldsymbol{x}}\_{23} \\ \tilde{\boldsymbol{x}}\_{11} + 2\tilde{\boldsymbol{x}}\_{21} & \tilde{\boldsymbol{x}}\_{12} + 2\tilde{\boldsymbol{x}}\_{22} & \tilde{\boldsymbol{x}}\_{13} + 2\tilde{\boldsymbol{x}}\_{23} \end{bmatrix} \\ \tilde{X}G &= \begin{bmatrix} \tilde{\boldsymbol{x}}\_{11} + i\tilde{\boldsymbol{x}}\_{12} + 2\tilde{\boldsymbol{x}}\_{13} & \tilde{\boldsymbol{x}}\_{12} & i\tilde{\boldsymbol{x}}\_{11} - i\tilde{\boldsymbol{x}}\_{12} + i\tilde{\boldsymbol{x}}\_{13} \\ \tilde{\boldsymbol{x}}\_{21} + i\tilde{\boldsymbol{x}}\_{22} + 2\tilde{\boldsymbol{x}}\_{23} & \tilde{\boldsymbol{x}}\_{22} & i\tilde{\boldsymbol{x}}\_{21} - i\tilde{\boldsymbol{x}}\_{22} + i\tilde{\boldsymbol{x}}\_{23} \end{bmatrix}. \end{split}$$

(1): Compute the distribution of *Z*˜ = *C*∗*XG*˜ ; (2): Compute the distribution of *Z*˜ = *C*∗*XG*˜ if *A* remains the same and *G* is equal to

$$
\begin{bmatrix}
\sqrt{3} & 0 & 0 \\
0 & -i\sqrt{\frac{3}{2}} & \frac{1}{\sqrt{2}}
\end{bmatrix},
$$

and study the properties of this distribution.

**Solution 4.3a.3.** Note that *A* = *A*<sup>∗</sup> and *B* = *B*<sup>∗</sup> and hence both *A* and *B* are Hermitian. Moreover, |*A*| = 1 and |*B*| = 1 and since all the leading minors of *A* and *B* are positive, *A* and *B* are both Hermitian positive definite. Then, the inverses of *A* and *B* in terms of the cofactors of their elements are

$$A^{-1} = \begin{bmatrix} 1 & -i \\ i & 2 \end{bmatrix}, \ [\text{Cof}(B)]' = \begin{bmatrix} 1 & -2i & -1 \\ 2i & 6 & -3i \\ -1 & 3i & 2 \end{bmatrix} = B^{-1}.$$

The linear forms provided above in connection with part (1) of this exercise can be respectively expressed in terms of the following matrices:

$$C^\* = \begin{bmatrix} i & -i \\ 1 & 2 \end{bmatrix}, \ G = \begin{bmatrix} 1 & 0 & i \\ i & 1 & -i \\ 2 & 0 & i \end{bmatrix}.$$

Let us now compute *C*∗*A*−1*C* and *G*∗*B*−1*G*:

$$\begin{split}C^\*A^{-1}C &= \begin{bmatrix} i & -i \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 1 & -i \\ i & 2 \end{bmatrix} \begin{bmatrix} -i & 1 \\ i & 2 \end{bmatrix} = 3 \begin{bmatrix} 1 & 1 - i \\ 1 + i & 3 \end{bmatrix} \\ G^\*B^{-1}G &= \begin{bmatrix} 1 & -i & 2 \\ 0 & 1 & 0 \\ -i & i & -i \end{bmatrix} \begin{bmatrix} 1 & -2i & -1 \\ 2i & 6 & -3i \\ -1 & 3i & 2 \end{bmatrix} \begin{bmatrix} 1 & 0 & i \\ i & 1 & -i \\ 2 & 0 & i \end{bmatrix} \\ &= \begin{bmatrix} 3 & -2i & -2+i \\ 2i & 6 & 1-6i \\ -2-i & 1+6i & 7 \end{bmatrix}.\end{split}$$

Then in (1), *<sup>C</sup>*∗*XG*˜ is a 2 <sup>×</sup> 3 complex matrix-variate Gaussian with *<sup>A</sup>*−<sup>1</sup> replaced by *C*∗*A*−1*C* and *B*−<sup>1</sup> replaced by *G*∗*B*−1*G* where *C*∗*A*−1*C* and *G*∗*B*−1*G* are given above. For answering (2), let us evaluate *G*∗*B*−1*G*:

$$\begin{aligned} \, \, \, G^\* B^{-1} G &= \begin{bmatrix} \sqrt{3} & \frac{i}{\sqrt{3}} & 0\\ 0 & \sqrt{\frac{2}{3}} & i\sqrt{\frac{3}{2}}\\ 0 & 0 & \frac{1}{\sqrt{2}} \end{bmatrix} \begin{bmatrix} 1 & -2i & -1\\ 2i & 6 & -3i\\ -1 & 3i & 2 \end{bmatrix} \begin{bmatrix} \sqrt{3} & 0 & 0\\ -\frac{i}{\sqrt{3}} & \sqrt{\frac{2}{3}} & 0\\ 0 & -i\sqrt{\frac{3}{2}} & \frac{1}{\sqrt{2}} \end{bmatrix} \\ &= \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix} = I\_3. \end{aligned}$$

Observe that this *q* × *q* matrix *G* which is such that *GG*<sup>∗</sup> = *B,* is nonsingular; thus, *<sup>G</sup>*∗*B*−1*<sup>G</sup>* <sup>=</sup> *<sup>G</sup>*∗*(GG*∗*)*−1*<sup>G</sup>* <sup>=</sup> *<sup>G</sup>*∗*(G*∗*)*−1*G*−1*<sup>G</sup>* <sup>=</sup> *<sup>I</sup>* . Letting *<sup>Y</sup>*˜ <sup>=</sup> *XG*˜ , *<sup>X</sup>*˜ <sup>=</sup> *Y G*˜ <sup>−</sup>1, and the exponent in the density of *X*˜ becomes

$$\operatorname{tr}(A^{-1}\tilde{X}B\tilde{X}^\*) = \operatorname{tr}(A^{-1}\tilde{Y}G^{-1}B(G^\*)^{-1}\tilde{Y}^\*) = \operatorname{tr}(Y^\*A\tilde{Y}) = \sum\_{j=1}^p \tilde{Y}\_{(j)}^\* A\tilde{Y}\_{(j)}$$

where the *Y*˜ *(j )*'s are the columns of *Y ,*˜ which are independently distributed complex *p*variate Gaussian vectors with common covariance matrix *A*−1. This completes the computations.

The conclusions obtained in the solution to Example 4.3a.1 suggest the corollaries that follow.

**Corollary 4.3a.1.** *Let the p* × *q matrix X*˜ *have a matrix-variate complex Gaussian distribution with the parameters M* = *O, A > O and B>O. Consider the transformation U*˜ = *C*∗*X*˜ *where C is a p* × *p nonsingular matrix such that CC*<sup>∗</sup> = *A so that <sup>C</sup>*∗*A*−1*<sup>C</sup>* <sup>=</sup> *<sup>I</sup> . Then the rows of <sup>U</sup>*˜ *, namely <sup>U</sup>*˜1*,..., <sup>U</sup>*˜*p, are mutually independently distributed q-variate complex Gaussian vectors with common covariance matrix B*−1*.*

**Corollary 4.3a.2.** *Let the p* × *q matrix X*˜ *have a matrix-variate complex Gaussian distribution with the parameters M* = *O, A > O and B>O. Consider the transformation <sup>Y</sup>*˜ <sup>=</sup> *XG*˜ *where <sup>G</sup> is a <sup>q</sup>*×*<sup>q</sup> nonsingular matrix such that GG*<sup>∗</sup> <sup>=</sup> *<sup>B</sup> so that <sup>G</sup>*∗*B*−1*<sup>G</sup>* <sup>=</sup> *<sup>I</sup> . Then the columns of Y*˜*, namely Y*˜ *(*1*),..., Y*˜ *(q), are independently distributed p-variate complex Gaussian vectors with common covariance matrix A*−1*.*

**Corollary 4.3a.3.** *Let the p* × *q matrix X*˜ *have a matrix-variate complex Gaussian distribution with the parameters M* = *O, A > O and B>O. Consider the transformation Z*˜ = *C*∗*XG*˜ *where C is a p* × *p nonsingular matrix such that CC*<sup>∗</sup> = *A and G is a q* × *q nonsingular matrix such that GG*<sup>∗</sup> = *B. Then, the elements z*˜*ij 's of Z*˜ = *(z*˜*ij ) are mutually independently distributed complex standard Gaussian variables.*

#### **4.3.3. Partitioning of the parameter matrix**

Suppose that in the *<sup>p</sup>* <sup>×</sup> *<sup>q</sup>* real matrix-variate case, we partition *<sup>T</sup>* as *T*1 *T*2 where *T*<sup>1</sup> is *p*<sup>1</sup> × *q* and *T*<sup>2</sup> is *p*<sup>2</sup> × *q,* so that *p*<sup>1</sup> + *p*<sup>2</sup> = *p*. Let *T*<sup>2</sup> = *O* (a null matrix). Then,

$$TB^{-1}T' = \begin{pmatrix} T\_1 \\ O \end{pmatrix} B^{-1} \begin{pmatrix} T\_1' & O \end{pmatrix} = \begin{bmatrix} T\_1 B^{-1} T\_1' & O\_1 \\ O\_2 & O\_3 \end{bmatrix}$$

where *T*1*B*−1*T* - <sup>1</sup> is a *p*<sup>1</sup> × *p*<sup>1</sup> matrix, *O*<sup>1</sup> is a *p*<sup>1</sup> × *p*<sup>2</sup> null matrix, *O*<sup>2</sup> is a *p*<sup>2</sup> × *p*<sup>1</sup> null matrix and *<sup>O</sup>*<sup>3</sup> is a *<sup>p</sup>*<sup>2</sup> <sup>×</sup> *<sup>p</sup>*<sup>2</sup> null matrix. Let us similarly partition *<sup>A</sup>*−<sup>1</sup> into sub-matrices:

$$A^{-1} = \begin{bmatrix} A^{11} & A^{12} \\ A^{21} & A^{22} \end{bmatrix},$$

where *<sup>A</sup>*<sup>11</sup> is *<sup>p</sup>*<sup>1</sup> <sup>×</sup> *<sup>p</sup>*<sup>1</sup> and *<sup>A</sup>*<sup>22</sup> is *<sup>p</sup>*<sup>2</sup> <sup>×</sup> *<sup>p</sup>*2. Then,

$$\text{tr}(A^{-1}TB^{-1}T') = \text{tr}\begin{bmatrix} A^{11}T\_1B^{-1}T\_1' & O\\ O & O \end{bmatrix} = \text{tr}(A^{11}T\_1B^{-1}T\_1').$$

If *A* is partitioned as

$$A = \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix},$$

where *A*<sup>11</sup> is *p*<sup>1</sup> × *p*<sup>1</sup> and *A*<sup>22</sup> is *p*<sup>2</sup> × *p*2*,* then, as established in Sect. 1.3, we have

$$A^{\rm l1} = (A\_{\rm l1} - A\_{\rm l2}A\_{22}^{-1}A\_{21})^{-1}.$$

Therefore, under this special case of *T* , the mgf is given by

$$E[\mathbf{e}^{\text{tr}(T\_1X\_1)}] = \mathbf{e}^{\frac{1}{2}\text{tr}((A\_{11} - A\_{12}A\_{22}^{-1}A\_{21})^{-1}T\_1B^{-1}T\_1')},\tag{4.3.9}$$

which is also the mgf of the *p*<sup>1</sup> × *q* sub-matrix of *X*. Note that the mgf's in (4.3.9) and (4.3.1) share an identical structure. Hence, due to the uniqueness of the mgf, *X*<sup>1</sup> has a real *p*<sup>1</sup> × *q* matrix-variate Gaussian density wherein the parameter *B* remains unchanged and *<sup>A</sup>* is replaced by *<sup>A</sup>*<sup>11</sup> <sup>−</sup> *<sup>A</sup>*12*A*−<sup>1</sup> <sup>22</sup> *A*21, the *Aij* 's denoting the sub-matrices of *A* as described earlier.

#### **4.3.4. Distributions of quadratic and bilinear forms**

Consider the real *p* × *q* Gaussian matrix *U* defined in (4.2.17) whose mean value matrix is *E*[*X*] = *M* = *O* and let *U* = *XB* 1 <sup>2</sup> . Then,

$$U'AU = \begin{bmatrix} U\_1'AU\_1 & U\_1'AU\_2 & \dots & U\_1'AU\_q \\ U\_2'AU\_1 & U\_2'AU\_2 & \dots & U\_2'AU\_q \\ \vdots & \vdots & \ddots & \vdots \\ U\_q'AU\_1 & U\_q'AU\_2 & \dots & U\_q'AU\_q \end{bmatrix} \tag{4.3.10}$$

where the *p*×1 column vectors of *U*, namely, *U*1*,...,Uq* , are independently distributed as *Np(O, A*−1*)* vectors, that is, the *Uj* 's are independently distributed real *p*-variate Gaussian (normal) vectors whose covariance matrix is *<sup>A</sup>*−<sup>1</sup> <sup>=</sup> *<sup>E</sup>*[*UU*- ]*,* with density

$$f\_{U\_j}(U\_j) = \frac{|A|^{\frac{1}{2}}}{(2\pi)^{\frac{p}{2}}} \mathbf{e}^{-\frac{1}{2}(U\_j'AU\_j)}, \ A > O. \tag{4.3.11}$$

What is then the distribution of *U*- *<sup>j</sup>AUj* for any particular *j* and what are the distributions of *U*- *<sup>i</sup>AUj , i* = *j* = 1*,...,q*? Let *zj* = *U*- *<sup>j</sup>AUj* and *zij* = *U*- *<sup>i</sup>AUj , i* = *j* . Letting *t* be a scalar parameter, consider the mgf of *zj* :

$$\begin{aligned} M\_{z\_j}(t) &= E[\mathbf{e}^{tz\_j}] = \int\_{U\_j} \mathbf{e}^{tU\_j'AU\_j} f\_{U\_j}(U\_j) \mathbf{d}U\_j \\ &= \frac{|A|^{\frac{1}{2}}}{(2\pi)^{\frac{p}{2}}} \int\_{U\_j} \mathbf{e}^{-\frac{1}{2}(1-2t)U\_j'AU\_j} \mathbf{d}U\_j \\ &= (1-2t)^{-\frac{p}{2}} \quad \text{for } 1-2t > 0, \end{aligned}$$

which is the mgf of a real gamma random variable with parameters *<sup>α</sup>* <sup>=</sup> *<sup>p</sup>* <sup>2</sup> *, β* = 2 or a real chi-square random variable with *p* degrees of freedom for *p* = 1*,* 2*,...* . That is,

*U*- *jAUj* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup> (*a real chi-square random variable having *p* degrees of freedom*).*

(4.3.12) In the complex case, observe that *U*˜ <sup>∗</sup> *<sup>j</sup> AU*˜ *<sup>j</sup>* is real when *A* = *A*<sup>∗</sup> *> O* and hence, the parameter in the mgf is real. On making the transformation *A* 1 <sup>2</sup>*U*˜ *<sup>j</sup>* = *V*˜ *<sup>j</sup>* , |det*(A)*| is canceled. Then, the exponent can be expressed in terms of

$$-(1-t)\tilde{Y}^\*\tilde{Y} = -(1-t)\sum\_{j=1}^p |\tilde{\mathbf{y}}\_j|^2 = -(1-t)\sum\_{j=1}^p (\mathbf{y}\_{j1}^2 + \mathbf{y}\_{j2}^2),$$

where *<sup>y</sup>*˜*<sup>j</sup>* <sup>=</sup> *yj*<sup>1</sup> <sup>+</sup> *iyj*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*. The integral gives *(*<sup>1</sup> <sup>−</sup> *t)*−*<sup>p</sup>* for 1 <sup>−</sup> *t >* 0. Hence, *V*˜ *<sup>j</sup>* = *U*˜ <sup>∗</sup> *<sup>j</sup> AU*˜ *<sup>j</sup>* has a real gamma distribution with the parameters *α* = *p, β* = 1, that is, a chi-square distribution with *p* degrees of freedom in the complex domain. Thus, 2*V*˜ *<sup>j</sup>* is a real chi-square random variable with 2*p* degrees of freedom, that is,

$$2\tilde{V}\_j = 2\tilde{U}\_j^\* A \tilde{U}\_j \sim \chi\_{2p}^2. \tag{4.3a.9}$$

What is then the distribution of *U*- *<sup>i</sup>AUj , i* = *j*? Let us evaluate the mgf of *U*- *<sup>i</sup>AUj* = *zij* . As *zij* is a function of *Ui* and *Uj* , we can integrate out over the joint density of *Ui* and *Uj* where *Ui* and *Uj* are independently distributed *p*-variate real Gaussian random variables:

$$\begin{split} M\_{z\_{ij}}(t) &= E[\mathbf{e}^{t\bar{z}\_{ij}}] = \int\_{U\_i} \int\_{U\_j} \mathbf{e}^{t'U\_i'AU\_j} f\_{U\_i}(U\_i) f\_{U\_j}(U\_j) \mathbf{d}U\_i \wedge \mathbf{d}U\_j \\ &= \frac{|A|}{(2\pi)^p} \int \int \mathbf{e}^{t'} U\_i' A U\_j - \frac{1}{2} U\_j' A U\_j - \frac{1}{2} U\_i' A U\_i \wedge \mathbf{d}U\_j. \end{split}$$

Let us first integrate out *Uj* . The relevant terms in the exponent are

$$-\frac{1}{2}(U\_j'A\,U\_j) + \frac{1}{2}(2\mathfrak{r})(U\_i'A\,U\_j) = -\frac{1}{2}(U\_j - C)^\prime A \,(U\_j - C) + \frac{1}{2}t^2 U\_i'A\,U\_i,\ \mathcal{C} = t\,U\_i.$$

But the integral over *Uj* which is the integral over *Uj* − *C* will result in the following representation:

$$\begin{split} M\_{z\_{ij}}(t) &= \frac{|A|^{\frac{1}{2}}}{(2\pi)^{\frac{p}{2}}} \int\_{U\_i} \mathbf{e}^{\frac{t^2}{2}} U\_i^{\prime} A U\_i - \frac{1}{2} U\_i^{\prime} A U\_i \mathbf{d} U\_i \\ &= (1 - t^2)^{-\frac{p}{2}} \quad \text{for } 1 - t^2 > 0. \end{split} \tag{4.3.13}$$

What is the density corresponding to the mgf *(*<sup>1</sup> <sup>−</sup> *<sup>t</sup>*2*)* −*p* <sup>2</sup> ? This is the mgf of a real scalar random variable *u* of the form *u* = *x* − *y* where *x* and *y* are independently distributed real scalar chi-square random variables. For *p* = 2, *x* and *y* will be exponential variables so that *u* will have a double exponential distribution or a real Laplace distribution. In the general case, the density of *u* can also be worked out when *x* and *y* are independently distributed real gamma random variables with different parameters whereas chi-squares with equal degrees of freedom constitutes a special case. For the exact distribution of covariance structures such as the *zij* 's, see Mathai and Sebastian (2022).

#### **Exercises 4.3**

**4.3.1.** In the moment generating function (mgf) (4.3.3), partition the *p* × *q* parameter matrix *T* into column sub-matrices such that *T* = *(T*1*, T*2*)* where *T*<sup>1</sup> is *p* × *q*<sup>1</sup> and *T*<sup>2</sup> is *p* × *q*<sup>2</sup> with *q*<sup>1</sup> + *q*<sup>2</sup> = *q*. Take *T*<sup>2</sup> = *O* (the null matrix). Simplify and show that if *X* is similarly partitioned as *X* = *(Y*1*, Y*2*),* then *Y*<sup>1</sup> has a real *p* × *q*<sup>1</sup> matrix-variate Gaussian density. As well, show that *Y*<sup>2</sup> has a real *p* × *q*<sup>2</sup> matrix-variate Gaussian density.

**4.3.2.** Referring to Exercises 4.3.1, write down the densities of *Y*<sup>1</sup> and *Y*2.

**4.3.3.** If *T* is the parameter matrix in (4.3.3), then what type of partitioning of *T* is required so that the densities of (1): the first row of *X*, (2): the first column of *X* can be determined, and write down these densities explicitly.

**4.3.4.** Repeat Exercises 4.3.1–4.3.3 by taking the mgf in (4.3*a*.3) for the corresponding complex case.

**4.3.5.** Write down the mgf explicitly for *p* = 2 and *q* = 2 corresponding to (4.3.3) and (4.3*a*.3), assuming general *A>O* and *B>O*.

**4.3.6.** Partition the mgf in the complex *p* × *q* matrix-variate Gaussian case, corresponding to the partition in Sect. 4.3.1 and write down the complex matrix-variate density corresponding to *T*˜ <sup>1</sup> in the complex case.

**4.3.7.** In the real *p* × *q* matrix-variate Gaussian case, partition the mgf parameter matrix into *T* = *(T(*1*), T(*2*))* where *T(*1*)* is *p* × *q*<sup>1</sup> and *T(*2*)* is *p* × *q*<sup>2</sup> with *q*<sup>1</sup> + *q*<sup>2</sup> = *q*. Obtain the density corresponding to *T(*1*)* by letting *T(*2*)* = *O*.

**4.3.8.** Repeat Exercise 4.3.7 for the complex *p* × *q* matrix-variate Gaussian case.

**4.3.9.** Consider *v* = *U*˜ <sup>∗</sup> *<sup>j</sup> AU*˜ *<sup>j</sup>* . Provide the details of the steps for obtaining (4.3*a*.9).

**4.3.10.** Derive the mgf of *U*˜ <sup>∗</sup> *<sup>i</sup> AU*˜ *<sup>j</sup> , i* = *j,* in the complex *p* × *q* matrix-variate Gaussian case, corresponding to (4.3.13).

#### **4.4. Marginal Densities in the Real Matrix-variate Gaussian Case**

On partitioning the real *p* × *q* Gaussian matrix into *X*<sup>1</sup> of order *p*<sup>1</sup> × *q* and *X*<sup>2</sup> of order *p*<sup>2</sup> × *q* so that *p*<sup>1</sup> + *p*<sup>2</sup> = *p*, it was determined by applying the mgf technique that *X*<sup>1</sup> has a *p*<sup>1</sup> × *q* matrix-variate Gaussian distribution with the parameter matrices *B* remaining unchanged while *<sup>A</sup>* was replaced by *<sup>A</sup>*<sup>11</sup> <sup>−</sup> *<sup>A</sup>*12*A*−<sup>1</sup> <sup>22</sup> *A*<sup>21</sup> where the *Aij* 's are the sub-matrices of *A*. This density is then the marginal density of the sub-matrix *X*<sup>1</sup> with respect to the joint density of *X*. Let us see whether the same result is available by direct integration of the remaining variables, namely by integrating out *X*2. We first consider the real case. Note that

$$\begin{aligned} \operatorname{tr}(AXBX') &= \operatorname{tr}\left[A\begin{pmatrix}X\_1\\X\_2\end{pmatrix}B(X\_1'\ X\_2')\right] \\ &= \operatorname{tr}\left[A\begin{pmatrix}X\_1BX\_1' & X\_1BX\_2'\\X\_2BX\_1' & X\_2BX\_2'\end{pmatrix}\right]. \end{aligned}$$

Now, letting *A* be similarly partitioned, we have

$$\begin{split} \operatorname{tr}(AXBX') &= \operatorname{tr}\left[ \begin{pmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{pmatrix} \begin{pmatrix} X\_1 \mathcal{B}X\_1' & X\_1 \mathcal{B}X\_2' \\ X\_2 \mathcal{B}X\_1' & X\_2 \mathcal{B}X\_2' \end{pmatrix} \right] \\ &= \operatorname{tr}(A\_{11}X\_1 \mathcal{B}X\_1') + \operatorname{tr}(A\_{12}X\_2 \mathcal{B}X\_1') \\ &+ \operatorname{tr}(A\_{21}X\_1 \mathcal{B}X\_2') + \operatorname{tr}(A\_{22}X\_2 \mathcal{B}X\_2'), \end{split}$$

as the remaining terms do not appear in the trace. However, *(A*12*X*2*BX*- 1*)*- = *X*1*BX*- <sup>2</sup>*A*21, and since tr*(P Q)* = tr*(QP )* and tr*(S)* = tr*(S*- *)* whenever *S, PQ* and *QP* are square matrices, we have

$$\text{tr}(AXBX') = \text{tr}(A\_{11}X\_1BX\_1') + 2\text{tr}(A\_{21}X\_1BX\_2') + \text{tr}(A\_{22}X\_2BX\_2').$$

We may now complete the quadratic form in tr*(A*22*X*2*BX*- <sup>2</sup>*)* + 2tr*(A*21*X*1*BX*- <sup>2</sup>*)* by taking a matrix *<sup>C</sup>* <sup>=</sup> *<sup>A</sup>*−<sup>1</sup> <sup>22</sup> *A*21*X*<sup>1</sup> and replacing *X*<sup>2</sup> by *X*<sup>2</sup> + *C*. Note that when *A > O, A*<sup>11</sup> *> O* and *A*<sup>22</sup> *> O*. Thus,

$$\begin{split} \operatorname{tr}(AXBX') &= \operatorname{tr}(A\_{22}(X\_2+C)B(X\_2+C)') + \operatorname{tr}(A\_{11}X\_1BX\_1') - \operatorname{tr}(A\_{12}A\_{22}^{-1}A\_{21}X\_1BX\_1') \\ &= \operatorname{tr}(A\_{22}(X\_2+C)B(X\_2+C)') + \operatorname{tr}((A\_{11}-A\_{12}A\_{22}^{-1}A\_{21})X\_1BX\_1'). \end{split}$$

On applying a result on partitioned matrices from Sect. 1.3, we have

$$|A| = |A\_{22}| \, |A\_{11} - A\_{12} A\_{22}^{-1} A\_{21}|,$$

and clearly, *(*2*π ) pq* <sup>2</sup> <sup>=</sup> *(*2*π ) <sup>p</sup>*1*<sup>q</sup>* <sup>2</sup> *(*2*π ) <sup>p</sup>*2*<sup>q</sup>* <sup>2</sup> *.* When integrating out *X*2, |*A*22| *q* <sup>2</sup> and *(*2*π ) <sup>p</sup>*2*<sup>q</sup>* <sup>2</sup> are getting canceled. Hence, the marginal density of *X*1, the *p*<sup>1</sup> × *q* sub-matrix of *X*, denoted by *fp*1*,q (X*1*)*, is given by

$$f\_{p\_{1},q}(X\_{1}) = \frac{|\mathcal{B}|^{\frac{p\_{1}}{2}}|A\_{11} - A\_{12}A\_{22}^{-1}A\_{21}|^{\frac{q}{2}}}{(2\pi)^{\frac{p\_{1}q}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}((A\_{11} - A\_{12}A\_{22}^{-1}A\_{21})X\_{1}BX\_{1}')}.\tag{4.4.1}$$

When *p*<sup>1</sup> = 1*, p*<sup>2</sup> = 0 and *p* = 1*,* we have the usual multivariate Gaussian density. When *p* = 1*,* the 1 × 1 matrix *A* will be taken as 1 without any loss of generality. Then, from (4.4.1), the multivariate (*q*-variate) Gaussian density corresponding to (4.2.3) is given by

$$f\_{1,q}(X\_1) = \frac{(1/2)^{\frac{q}{2}}|B|^{\frac{1}{2}}}{\pi^{\frac{q}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}(X\_1 B X\_1')} = \frac{|B|^{\frac{1}{2}}}{(2\pi)^{\frac{q}{2}}} \mathbf{e}^{-\frac{1}{2}X\_1 B X\_1'}$$

since the 1 × 1 quadratic form *X*1*BX*- <sup>1</sup> is equal to its trace. It is usually expressed in terms of *<sup>B</sup>* <sup>=</sup> *<sup>V</sup>* <sup>−</sup>1*,V >O*. When *<sup>q</sup>* <sup>=</sup> <sup>1</sup>*, X* is reducing to a *<sup>p</sup>* <sup>×</sup> 1 vector, say *<sup>Y</sup>* . Thus, for a *p* × 1 column vector *Y* with a location parameter *μ,* the density, denoted by *fp,*1*(Y )*, is the following:

$$f\_{p,1}(Y) = \frac{1}{|V|^{\frac{1}{2}}(2\pi)^{\frac{p}{2}}} \mathbf{e}^{-\frac{1}{2}(Y-\mu)'V^{-1}(Y-\mu)},\tag{4.4.2}$$

where *Y* - = *(y*1*,...,yp), μ*- = *(μ*1*,...,μp),* −∞ *< yj <* ∞*,* −∞ *< μj <* ∞*, j* = 1*,... , p, V > O*. Observe that when *Y* is *p* × 1*,* tr*(Y* − *μ)*- *<sup>V</sup>* <sup>−</sup>1*(Y* <sup>−</sup> *μ)* <sup>=</sup> *(Y* <sup>−</sup> *μ)*- *<sup>V</sup>* <sup>−</sup>1*(Y* <sup>−</sup> *μ)*. From symmetry, we can write down the density of the sub-matrix *<sup>X</sup>*<sup>2</sup> of *X* from the density given in (4.4.1). Let us denote the density of *X*<sup>2</sup> by *fp*2*,q (X*2*)*. Then,

$$f\_{p\_2,q}(X\_2) = \frac{|B|^{\frac{p\_2}{2}}|A\_{22} - A\_{21}A\_{11}^{-1}A\_{12}|^{\frac{q}{2}}}{(2\pi)^{\frac{p\_2q}{2}}}e^{-\frac{1}{2}\text{tr}((A\_{22} - A\_{21}A\_{11}^{-1}A\_{12})X\_2BX\_2')}.\tag{4.4.3}$$

Note that *<sup>A</sup>*<sup>22</sup> <sup>−</sup> *<sup>A</sup>*21*A*−<sup>1</sup> <sup>11</sup> *A*<sup>12</sup> *> O* as *A>O*, our intial assumptions being that *A>O* and *B>O*.

**Theorem 4.4.1.** *Let the p*×*q real matrix X have a real matrix-variate Gaussian density with the parameter matrices A>O and B>O where A is p* × *p and B is q* × *q. Let X be partitioned into sub-matrices as X* = *X*<sup>1</sup> *X*<sup>2</sup> *where X*<sup>1</sup> *is p*<sup>1</sup> ×*q and X*<sup>2</sup> *is p*<sup>2</sup> ×*q, with p*<sup>1</sup> + *p*<sup>2</sup> = *p. Let A be partitioned into sub-matrices as A* = *A*<sup>11</sup> *A*<sup>12</sup> *<sup>A</sup>*<sup>21</sup> *<sup>A</sup>*<sup>22</sup> *where A*<sup>11</sup> *is p*<sup>1</sup> × *p*1*. Then X*<sup>1</sup> *has a p*<sup>1</sup> × *q real matrix-variate Gaussian density with the parameter matrices <sup>A</sup>*<sup>11</sup> <sup>−</sup> *<sup>A</sup>*12*A*−<sup>1</sup> <sup>22</sup> *A*<sup>21</sup> *> O and B>O, as given in (4.4.1) and X*<sup>2</sup> *has a p*<sup>2</sup> × *q real matrix-variate Gaussian density with the parameter matrices <sup>A</sup>*<sup>22</sup> <sup>−</sup>*A*21*A*−<sup>1</sup> <sup>11</sup> *A*<sup>12</sup> *> O and B > O, as given in (4.4.3).*

Observe that the *p*<sup>1</sup> rows taken in *X*<sup>1</sup> need not be the first *p*<sup>1</sup> rows. They can be any set of *p*<sup>1</sup> rows. In that instance, it suffices to make the corresponding permutations in the rows and columns of *A* and *B* so that the new set of *p*<sup>1</sup> rows can be taken as the first *p*<sup>1</sup> rows, and similarly for *X*2.

Can a similar result be obtained in connection with a matrix-variate Gaussian distribution if we take a set of column vectors and form a sub-matrix of *X*? Let us partition the *p* × *q* matrix *X* into sub-matrices of columns as *X* = *(Y*<sup>1</sup> *Y*2*)* where *Y*<sup>1</sup> is *p* × *q*<sup>1</sup> and *Y*<sup>2</sup> is *p*×*q*<sup>2</sup> such that *q*1+*q*<sup>2</sup> = *q*. The variables *Y*1*, Y*<sup>2</sup> are used in order to avoid any confusion with *X*1*, X*<sup>2</sup> utilized in the discussions so far. Let us partition *B* as follows:

$$B = \begin{bmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{bmatrix}, \ B\_{11} \text{ being } q\_1 \times q\_1, \ B\_{22} \text{ being } q\_2 \times q\_2.$$

Then,

$$\begin{split} \operatorname{tr}(AXBX') &= \operatorname{tr}[A(Y\_1\,Y\_2) \begin{bmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{bmatrix} \begin{pmatrix} Y'\_1 \\ Y'\_2 \end{pmatrix}] \\ &= \operatorname{tr}(AY\_1B\_{11}Y'\_1) + \operatorname{tr}(AY\_2B\_{21}Y'\_1) + \operatorname{tr}(AY\_1B\_{12}Y'\_2) + \operatorname{tr}(AY\_2B\_{22}Y'\_2) \\ &= \operatorname{tr}(AY\_1B\_{11}Y'\_1) + 2\operatorname{tr}(AY\_1B\_{12}Y'\_2) + \operatorname{tr}(AY\_2B\_{22}Y'\_2). \end{split}$$

As in the previous case of row sub-matrices, we complete the quadratic form:

$$\begin{split} \operatorname{tr}(AXBX') &= \operatorname{tr}(AY\_1B\_{11}Y\_1') - \operatorname{tr}(AY\_1(B\_{12}B\_{22}^{-1}B\_{21}Y\_1') + \operatorname{tr}(A(Y\_2+C)B\_{22}(Y\_2+C)')) \\ &= \operatorname{tr}(AY\_1(B\_{11}-B\_{12}B\_{22}^{-1}B\_{21})Y\_1') + \operatorname{tr}(A(Y\_2+C)B\_{22}(Y\_2+C)'). \end{split}$$

Now, by integrating out *Y*2*,* we have the result, observing that *A > O, B > O, B*<sup>11</sup> − *B*12*B*−<sup>1</sup> <sup>22</sup> *<sup>B</sup>*<sup>21</sup> *> O* and <sup>|</sup>*B*|=|*B*22| |*B*<sup>11</sup> <sup>−</sup> *<sup>B</sup>*12*B*−<sup>1</sup> <sup>22</sup> *B*21|. A similar result follows for the marginal density of *Y*2. These results will be stated as the next theorem.

**Theorem 4.4.2.** *Let the p*×*q real matrix X have a real matrix-variate Gaussian density with the parameter matrices M* = *O, A > O and B>O where A is p* × *p and B is q* × *q. Let X be partitioned into column sub-matrices as X* = *(Y*<sup>1</sup> *Y*2*) where Y*<sup>1</sup> *is p* × *q*<sup>1</sup> *and Y*<sup>2</sup> *is p* × *q*<sup>2</sup> *with q*<sup>1</sup> + *q*<sup>2</sup> = *q. Then Y*<sup>1</sup> *has a p* × *q*<sup>1</sup> *real matrix-variate Gaussian density with the parameter matrices A>O and <sup>B</sup>*<sup>11</sup> <sup>−</sup> *<sup>B</sup>*12*B*−<sup>1</sup> <sup>22</sup> *B*<sup>21</sup> *> O, denoted by fp,q*<sup>1</sup> *(Y*1*), and Y*<sup>2</sup> *has a p* × *q*<sup>2</sup> *real matrix-variate Gaussian density denoted by fp,q*<sup>2</sup> *(Y*2*) where*

$$f\_{p,q\_1}(Y\_1) = \frac{|A|^{\frac{q\_1}{2}}|B\_{11} - B\_{12}B\_{22}^{-1}B\_{21}|^{\frac{p}{2}}}{(2\pi)^{\frac{pq\_1}{2}}} \text{e}^{-\frac{1}{2}\text{tr}[AY\_1(B\_{11} - B\_{12}B\_{22}^{-1}B\_{21})Y\_1']} \tag{4.4.4}$$

$$f\_{p,q\_2}(Y\_2) = \frac{|A|^{\frac{q\_2}{2}}|B\_{22} - B\_{21}B\_{11}^{-1}B\_{12}|^{\frac{p}{2}}}{(2\pi)^{\frac{pq\_2}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}[AY\_2(B\_{22} - B\_{21}B\_{11}^{-1}B\_{12})Y\_2']}.\tag{4.4.5}$$

If *q* = 1 and *q*<sup>2</sup> = 0 in (4.4.4), *q*<sup>1</sup> = 1. When *q* = 1*,* the 1 × 1 matrix *B* is taken to be 1. Then *Y*<sup>1</sup> in (4.4.4) is *p* × 1 or a column vector of *p* real scalar variables. Let it still be denoted by *Y*1. Then the corresponding density, which is a real *p*-variate Gaussian (normal) density, available from (4.4.4) or from the basic matrix-variate density, is the following:

$$\begin{split} f\_{p,1}(Y\_1) &= \frac{|A|^{\frac{1}{2}} (1/2)^{\frac{p}{2}}}{\pi^{\frac{p}{2}}} \mathrm{e}^{-\frac{1}{2} \mathrm{tr} (AY\_1Y\_1')} \\ &= \frac{|A|^{\frac{1}{2}}}{(2\pi)^{\frac{p}{2}}} \mathrm{e}^{-\frac{1}{2} \mathrm{tr} (Y\_1'AY\_1)} = \frac{|A|^{\frac{1}{2}}}{(2\pi)^{\frac{p}{2}}} \mathrm{e}^{-\frac{1}{2} Y\_1'AY\_1}, \end{split} \tag{4.4.6}$$

observing that tr*(Y* - <sup>1</sup>*AY*1*)* = *Y* - <sup>1</sup>*AY*<sup>1</sup> since *Y*<sup>1</sup> is *p* × 1 and then, *Y* - <sup>1</sup>*AY*<sup>1</sup> is 1 × 1. In the usual representation of a multivariate Gaussian density, *<sup>A</sup>* replaced by *<sup>A</sup>* <sup>=</sup> *<sup>V</sup>* <sup>−</sup>1, *<sup>V</sup>* being positive definite.

**Example 4.4.1.** Let the 2 × 3 matrix *X* = *(xij )* have a real matrix-variate distribution with the parameter matrices *M* = *O, A > O, B > O* where

$$X = \begin{bmatrix} \chi\_{11} & \chi\_{12} & \chi\_{13} \\ \chi\_{21} & \chi\_{22} & \chi\_{23} \end{bmatrix}, \ A = \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix}, \ B = \begin{bmatrix} 3 & -1 & 0 \\ -1 & 1 & 1 \\ 0 & 1 & 2 \end{bmatrix}.$$

Let us partition *X, A* and *B* as follows:

$$X = \begin{bmatrix} X\_1 \\ X\_2 \end{bmatrix} = [Y\_1, Y\_2], \ A = \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix}, \ B = \begin{bmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{bmatrix}$$

where *A*<sup>11</sup> = *(*2*), A*<sup>12</sup> = *(*1*), A*<sup>21</sup> = *(*1*), A*<sup>22</sup> = *(*1*)*, *X*<sup>1</sup> = [*x*11*, x*12*, x*13], *X*<sup>2</sup> = [*x*21*, x*22*, x*23],

$$Y\_1 = \begin{bmatrix} \mathbf{x}\_{11} & \mathbf{x}\_{12} \\ \mathbf{x}\_{21} & \mathbf{x}\_{22} \end{bmatrix},\ Y\_2 = \begin{bmatrix} \mathbf{x}\_{13} \\ \mathbf{x}\_{23} \end{bmatrix},\ B\_{11} = \begin{bmatrix} 3 & -1 \\ -1 & 1 \end{bmatrix},\ B\_{12} = \begin{bmatrix} 0 \\ 1 \end{bmatrix},$$

*B*<sup>21</sup> = [0*,* 1]*, B*<sup>22</sup> = *(*2*)*. Compute the densities of *X*1*, X*2*, Y*<sup>1</sup> and *Y*2.

**Solution 4.4.1.** We need the following quantities: *<sup>A</sup>*<sup>11</sup> <sup>−</sup> *<sup>A</sup>*12*A*−<sup>1</sup> <sup>22</sup> *A*<sup>21</sup> = 2 − 1 = 1, *<sup>A</sup>*<sup>22</sup> <sup>−</sup> *<sup>A</sup>*21*A*−<sup>1</sup> <sup>11</sup> *<sup>A</sup>*<sup>12</sup> <sup>=</sup> <sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> , |*B*| = 1,

$$\begin{aligned} B\_{22} - B\_{21}B\_{11}^{-1}B\_{12} &= 2 - [0,1] \left(\frac{1}{2}\right) \begin{bmatrix} 1 & 1 \\ 1 & 3 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} = 2 - \frac{3}{2} = \frac{1}{2} \\\ B\_{11} - B\_{12}B\_{22}^{-1}B\_{21} &= \begin{bmatrix} 3 & -1 \\ -1 & 1 \end{bmatrix} - \begin{bmatrix} 0 \\ 1 \end{bmatrix} \left(\frac{1}{2}\right)[0,1] = \begin{bmatrix} 3 & -1 \\ -1 & \frac{1}{2} \end{bmatrix}.\end{aligned}$$

Let us compute the constant parts or normalizing constants in the various densities. With our usual notations, the normalizing constants in *fp*1*,q (X*1*)* and *fp*2*,q (X*2*)* are

$$\frac{|\mathcal{B}|^{\frac{p\_1}{2}}|A\_{11} - A\_{12}A\_{22}^{-1}A\_{21}|^{\frac{q}{2}}}{(2\pi)^{\frac{p\_1q}{2}}} = \frac{|\mathcal{B}|^{\frac{1}{2}}(1)^{\frac{3}{2}}}{(2\pi)^{\frac{3}{2}}}.$$

$$\frac{|\mathcal{B}|^{\frac{p\_2}{2}}|A\_{22} - A\_{21}A\_{11}^{-1}A\_{12}|^{\frac{3}{2}}}{(2\pi)^{\frac{p\_2q}{2}}} = \frac{|\mathcal{B}|^{\frac{1}{2}}(\frac{1}{2})^{\frac{3}{2}}}{(2\pi)^{\frac{3}{2}}}.$$

Hence, the corresponding densities of *X*<sup>1</sup> and *X*<sup>2</sup> are the following:

$$\begin{aligned} f\_{1,3}(X\_1) &= \frac{|B|^{\frac{1}{2}}}{(2\pi)^{\frac{3}{2}}} \mathrm{e}^{-\frac{1}{2}X\_1 B X\_1'}, \ -\infty < x\_{1j} < \infty, \ j = 1, 2, 3, \\\ f\_{1,3}(X\_2) &= \frac{|B|^{\frac{1}{2}}}{2^{\frac{3}{2}}(2\pi)^{\frac{3}{2}}} \mathrm{e}^{-\frac{1}{4}(X\_2 B X\_2')}, \ -\infty < x\_{2j} < \infty, \ j = 1, 2, 3. \end{aligned}$$

Let us now evaluate the normalizing constants in the densities *fp,q*<sup>1</sup> *(Y*1*), fp,q*<sup>2</sup> *(Y*2*)*:

$$\frac{|A|^{\frac{q\_1}{2}}|B\_{11} - B\_{12}B\_{22}^{-1}B\_{21}|^{\frac{p}{2}}}{(2\pi)^{\frac{pq\_1}{2}}} = \frac{|A|^{\frac{1}{2}}(\frac{1}{2})^{\mathrm{l}}}{4\pi^2} = \frac{1}{8\pi^2},$$

$$\frac{|A|^{\frac{q\_2}{2}}|B\_{22} - B\_{21}B\_{11}^{-1}B\_{12}|^{\frac{p}{2}}}{(2\pi)^{\frac{pq\_2}{2}}} = \frac{|A|^{\frac{1}{2}}(\frac{1}{2})^{\mathrm{l}}}{2\pi} = \frac{1}{4\pi}.$$

Thus, the density of *Y*<sup>1</sup> is

$$\begin{split} f\_{2,2}(Y\_1) &= \frac{1}{8\pi^2} \mathbf{e}^{-\frac{1}{2} \text{tr}\{AY\_1(B\_{11} - B\_{12}B\_{22}^{-1}B\_{21})Y\_1'\}} \\ &= \frac{1}{8\pi^2} \mathbf{e}^{-\frac{1}{2}\mathcal{Q}}, \ -\infty < \mathbf{x}\_{ij} < \infty, \ i, j = 1, 2, \end{split}$$

where

$$\begin{split} \mathcal{Q} &= \text{tr}\left\{ \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} \mathbf{x}\_{11} & \mathbf{x}\_{12} \\ \mathbf{x}\_{21} & \mathbf{x}\_{22} \end{bmatrix} \begin{bmatrix} 3 & -1 \\ -1 & \frac{1}{2} \end{bmatrix} \begin{bmatrix} \mathbf{x}\_{11} & \mathbf{x}\_{21} \\ \mathbf{x}\_{12} & \mathbf{x}\_{22} \end{bmatrix} \right\} \\ &= 6\mathbf{x}\_{11}^2 + \mathbf{x}\_{12}^2 + 3\mathbf{x}\_{21}^2 + \frac{1}{2}\mathbf{x}\_{22}^2 - 4\mathbf{x}\_{11}\mathbf{x}\_{12} - 2\mathbf{x}\_{11}\mathbf{x}\_{22} \\ &+ 3\mathbf{x}\_{11}\mathbf{x}\_{21} + \mathbf{x}\_{12}\mathbf{x}\_{22} - 2\mathbf{x}\_{12}\mathbf{x}\_{21} - \mathbf{x}\_{22}\mathbf{x}\_{21}, \end{split}$$

the density of *Y*<sup>2</sup> being

$$\begin{split} f\_{2,1}(Y\_2) &= \frac{1}{4\pi} \mathbf{e}^{-\frac{1}{2} \text{tr}\{AY\_2(B\_{22} - B\_{21}B\_{11}^{-1}B\_{12})Y\_1'\}} \\ &= \frac{1}{4\pi} \mathbf{e}^{-\frac{1}{4}[2\mathbf{x}\_{13}^2 + \mathbf{x}\_{23}^2 + 2\mathbf{x}\_{13}\mathbf{x}\_{23}]}, \ -\infty < \mathbf{x}\_{i3} < \infty, \ i = 1, 2. \end{split}$$

This completes the computations.

#### **4.4a. Marginal Densities in the Complex Matrix-variate Gaussian Case**

The derivations of the results are parallel to those provided in the real case. Accordingly, we will state the corresponding results.

**Theorem 4.4a.1.** *Let the p*×*q matrix X*˜ *have a complex matrix-variate Gaussian density with the parameter matrices M* = *O, A > O, B > O where A is p* × *p and B is q* × *q. Consider a row partitioning of X*˜ *into sub-matrices X*˜ <sup>1</sup> *and X*˜ <sup>2</sup> *where X*˜ <sup>1</sup> *is p*<sup>1</sup> × *q and X*˜ <sup>2</sup> *is p*<sup>2</sup> × *q, with p*<sup>1</sup> + *p*<sup>2</sup> = *p. Then, X*˜ <sup>1</sup> *and X*˜ <sup>2</sup> *have p*<sup>1</sup> × *q complex matrixvariate and p*<sup>2</sup> × *q complex matrix-variate Gaussian densities with parameter matrices <sup>A</sup>*11−*A*12*A*−<sup>1</sup> <sup>22</sup> *<sup>A</sup>*<sup>21</sup> *and <sup>B</sup>, and <sup>A</sup>*22−*A*21*A*−<sup>1</sup> <sup>11</sup> *A*<sup>12</sup> *and B, respectively, denoted by f*˜ *<sup>p</sup>*1*,q (X*˜ <sup>1</sup>*) and f*˜ *<sup>p</sup>*2*,q (X*˜ <sup>2</sup>*). The density of X*˜ <sup>1</sup> *is given by*

$$\tilde{f}\_{p\_1,q}(\tilde{X}\_1) = \frac{|\det(B)|^{p\_1}|\det(A\_{11} - A\_{12}A\_{22}^{-1}A\_{21})|^q}{\pi^{p\_1q}} \mathbf{e}^{-\text{tr}((A\_{11} - A\_{12}A\_{22}^{-1}A\_{21})\tilde{X}\_1 B\tilde{X}\_1^\*)},\tag{4.4a.1}$$

*the corresponding vector case for p* = 1 *being available from (4.4a.1) for p*<sup>1</sup> = 1*, p*<sup>2</sup> = 0 *and p* = 1*; in this case, the density is*

$$\tilde{f}\_{1,q}(\tilde{X}\_1) = \frac{|\det(\mathcal{B})|}{\pi^q} \mathbf{e}^{-(\tilde{X}\_1 - \mu)\mathcal{B}(\tilde{X}\_1 - \mu)^\*} \tag{4.4a.2}$$

*where X*˜ <sup>1</sup> *and μ are* 1 × *q and μ is a location parameter vector. The density of X*˜ <sup>2</sup> *is the following:*

$$\tilde{f}\_{p\_2,q}(\tilde{X}\_2) = \frac{|\det(B)|^{p\_2}|\det(A\_{22} - A\_{21}A\_{11}^{-1}A\_{12})|^q}{\pi^{p\_2q}} \mathbf{e}^{-\text{tr}((A\_{22} - A\_{21}A\_{11}^{-1}A\_{12})\tilde{X}\_2 B\tilde{X}\_2^\*)}.\tag{4.4a.3}$$

**Theorem 4.4a.2.** *Let X, A* ˜ *and B be as defined in Theorem 4.2a.1 and let X*˜ *be partitioned into column sub-matrices X*˜ = *(Y*˜ <sup>1</sup> *Y*˜ <sup>2</sup>*) where Y*˜ <sup>1</sup> *is p* ×*q*<sup>1</sup> *and Y*˜ <sup>2</sup> *is p* ×*q*2*, so that q*<sup>1</sup> + *q*<sup>2</sup> = *q. Then Y*˜ <sup>1</sup> *and Y*˜ <sup>2</sup> *have p* × *q*<sup>1</sup> *complex matrix-variate and p* × *q*<sup>2</sup> *complex matrix-variate Gaussian densities given by*

Arak M. Mathai, Serge B. Provost, Hans J. Haubold

$$
\begin{split}
\tilde{f}\_{p,q\_1}(\tilde{Y}\_1) &= \frac{|\det(A)|^{q\_1} |\det(B\_{11} - B\_{12}B\_{22}^{-1}B\_{21})|^p}{\pi^{pq\_1}} \\ &\times e^{-\text{tr}(A\tilde{Y}\_1(B\_{11} - B\_{12}B\_{22}^{-1}B\_{21})\tilde{Y}\_1^\*)} \\
\tilde{f}\_{p,q\_2}(\tilde{Y}\_2) &= \frac{|\det(A)|^{q\_2} |\det(B\_{22} - B\_{21}B\_{11}^{-1}B\_{21})|^p}{\pi^{pq\_2}} \\ &\times e^{-\text{tr}(A\tilde{Y}\_2(B\_{22} - B\_{21}B\_{11}^{-1}B\_{12})\tilde{Y}\_2^\*)}.
\end{split}
(4.4a.5)$$

When *q* = 1, we have the usual complex multivariate case. In this case, it will be a *p*variate complex Gaussian density. This is available from (4.4*a*.4) by taking *q*<sup>1</sup> = 1*, q*<sup>2</sup> = 0 and *q* = 1. Now, *Y*˜ <sup>1</sup> is a *p* × 1 column vector. Let *μ* be a *p* × 1 location parameter vector. Then the density is

$$\tilde{f}\_{p,1}(\tilde{Y}\_1) = \frac{|\det(A)|}{\pi^p} \mathbf{e}^{-(\tilde{Y}\_1 - \mu)^\* A(\tilde{Y}\_1 - \mu)} \tag{4.4a.6}$$

where *A>O* (Hermitian positive definite), *Y*˜ <sup>1</sup> − *μ* is *p* × 1 and its 1 × *p* conjugate transpose is *(Y*˜ <sup>1</sup> − *μ)*∗.

**Example 4.4a.1.** Consider a 2×3 complex matrix-variate Gaussian distribution with the parameters *M* = *O, A > O, B > O* where

$$
\tilde{X} = \begin{bmatrix} \tilde{\chi}\_{11} & \tilde{\chi}\_{12} & \tilde{\chi}\_{13} \\ \tilde{\chi}\_{21} & \tilde{\chi}\_{22} & \tilde{\chi}\_{23} \end{bmatrix}, \ A = \begin{bmatrix} 2 & i \\ -i & 2 \end{bmatrix}, \ B = \begin{bmatrix} 2 & -i & i \\ i & 2 & -i \\ -i & i & 2 \end{bmatrix}.
$$

Consider the partitioning

$$A = \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix}, \ B = \begin{bmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{bmatrix}, \ \tilde{X} = \begin{bmatrix} \tilde{X}\_1 \\ \tilde{X}\_2 \end{bmatrix} = [\tilde{Y}\_1, \tilde{Y}\_2]^2$$

where

$$\begin{aligned} \tilde{Y}\_1 = \begin{bmatrix} \tilde{\chi}\_{11} \\ \tilde{\chi}\_{21} \end{bmatrix}, \quad \tilde{Y}\_2 = \begin{bmatrix} \tilde{\chi}\_{12} & \tilde{\chi}\_{13} \\ \tilde{\chi}\_{22} & \tilde{\chi}\_{23} \end{bmatrix}, \quad B\_{22} = \begin{bmatrix} 2 & -i \\ i & 2 \end{bmatrix}, \quad B\_{21} = \begin{bmatrix} i \\ -i \end{bmatrix}, \\\ \tilde{Y}\_3 = \begin{bmatrix} \vdots \\ -i \end{bmatrix}, \quad \tilde{Y}\_4 = \begin{bmatrix} \tilde{\chi}\_{11} \\ -i \end{bmatrix}. \end{aligned}$$

*X*˜ <sup>1</sup> = [˜*x*11*, x*˜12*, x*˜13], *X*˜ <sup>2</sup> = [˜*x*21*, x*˜22*, x*˜23], *A*<sup>11</sup> = *(*2*), A*<sup>12</sup> = *(i), A*<sup>21</sup> = *(*−*i), A*<sup>22</sup> = 2; *B*<sup>11</sup> = *(*2*), B*<sup>12</sup> = [−*i, i*]*.* Compute the densities of *X*˜ <sup>1</sup>*, X*˜ <sup>2</sup>*, Y*˜ <sup>1</sup>*, Y*˜ 2.

**Solution 4.4a.1.** It is easy to ascertain that *A* = *A*<sup>∗</sup> and *B* = *B*∗; hence both matrices are Hermitian. As well, all the leading minors of *A* and *B* are positive so that *A>O* and *B>O*. We need the following numerical results: |*A*| = 3*,* |*B*| = 2,

$$\begin{aligned} A\_{11} - A\_{12}A\_{22}^{-1}A\_{21} &= 2 - (i)(1/2)(-i) = 2 - \frac{1}{2} = \frac{3}{2}, \\ A\_{22} - A\_{21}A\_{11}^{-1}A\_{12} &= 2 - (-i)(1/2)(i) = 2 - \frac{1}{2} = \frac{3}{2} \end{aligned}$$

Matrix-Variate Gaussian Distribution

$$\begin{aligned} B\_{11} - B\_{12}B\_{22}^{-1}B\_{21} &= 2 - [-i, i] \left(\frac{1}{3}\right) \begin{bmatrix} 2 & i \\ -i & 2 \end{bmatrix} \begin{bmatrix} i \\ -i \end{bmatrix} = \frac{2}{3} \\\ B\_{22} - B\_{21}B\_{11}^{-1}B\_{12} &= \begin{bmatrix} 2 & -i \\ i & 2 \end{bmatrix} - \begin{bmatrix} i \\ -i \end{bmatrix} \begin{pmatrix} 1 \\ 2 \end{pmatrix} \begin{bmatrix} -i, i \end{bmatrix} = \begin{bmatrix} 2 & -i \\ i & 2 \end{bmatrix} - \frac{1}{2} \begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix} \\ &= \frac{1}{2} \begin{bmatrix} 3 & 1 - 2i \\ 1 + 2i & 3 \end{bmatrix} .\end{aligned}$$

With these preliminary calculations, we can obtain the required densities with our usual notations:

$$\begin{split} \tilde{f}\_{p\_{1},q}(\tilde{X}\_{1}) &= \frac{|\det(\mathcal{B})|^{p\_{1}} |\det(A\_{11} - A\_{12}A\_{22}^{-1}A\_{21})|^{q}}{\pi^{p\_{1}q}} \\ &\times \mathbf{e}^{-\text{tr}[(A\_{11} - A\_{12}A\_{22}^{-1}A\_{21})\tilde{X}\_{1}B\tilde{X}\_{1}^{\*}]}, \text{ that is,} \\ \tilde{f}\_{1,3}(\tilde{X}\_{1}) &= \frac{2(3/2)^{3}}{\pi^{3}} \mathbf{e}^{-\frac{3}{2}\tilde{X}\_{1}B\tilde{X}\_{1}^{\*}} \end{split}$$

where

$$\begin{split} \mathcal{Q}\_{1} &= \tilde{X}\_{1} \mathcal{B} \tilde{X}\_{1}^{\*} = [\tilde{x}\_{11}, \tilde{x}\_{12}, \tilde{x}\_{13}] \begin{bmatrix} 2 & -i & i \\ i & 2 & -i \\ -i & i & 2 \end{bmatrix} \begin{bmatrix} \tilde{x}\_{11}^{\*} \\ \tilde{x}\_{12}^{\*} \\ \tilde{x}\_{13}^{\*} \end{bmatrix} \\ &= 2 \tilde{x}\_{11} \tilde{x}\_{11}^{\*} + 2 \tilde{x}\_{12} \tilde{x}\_{12}^{\*} + 2 \tilde{x}\_{13} \tilde{x}\_{13}^{\*} - i \tilde{x}\_{11} \tilde{x}\_{12}^{\*} + i \tilde{x}\_{11} \tilde{x}\_{13}^{\*} \\ &+ i \tilde{x}\_{12} \tilde{x}\_{11}^{\*} - i \tilde{x}\_{12} \tilde{x}\_{13}^{\*} - i \tilde{x}\_{13} \tilde{x}\_{11}^{\*} + i \tilde{x}\_{13} \tilde{x}\_{12}^{\*}; \end{split}$$

$$\begin{split} \tilde{f}\_{p\_2,q}(\tilde{X}\_2) &= \frac{|\det(A)|^{p\_2} |\det(A\_{22} - A\_{21} A\_{11}^{-1} A\_{12})|^q}{\pi^{p\_2 q}} \\ &\times \mathbf{e}^{-\text{tr}[(A\_{22} - A\_{21} A\_{11}^{-1} A\_{12}) \tilde{X}\_2 B \tilde{X}\_2^\*]}, \text{ that is,} \\ \tilde{f}\_{1,3}(\tilde{X}\_2) &= \frac{2(3/2)^3}{\pi^3} \mathbf{e}^{-\frac{3}{2} \tilde{X}\_2 B \tilde{X}\_2^\*} \end{split}$$

where let *Q*<sup>2</sup> = *X*˜ <sup>2</sup>*BX*˜ <sup>∗</sup> <sup>2</sup>, *Q*<sup>2</sup> being obtained by replacing *X*˜ <sup>1</sup> in *Q*<sup>1</sup> by *X*˜ 2;

$$\begin{split} \tilde{f}\_{p,q\_{1}}(\tilde{Y}\_{1}) &= \frac{|\det(A)|^{q\_{1}}|\det(B\_{11} - B\_{12}B\_{22}^{-1}B\_{21})|^{p}}{\pi^{p q\_{1}}} \\ &\times \mathbf{e}^{-\text{tr}[A\tilde{Y}\_{1}(B\_{11} - B\_{12}B\_{22}^{-1}B\_{21})\tilde{Y}\_{1}^{\*}]}, \text{ that is,} \\ \tilde{f}\_{2,1}(\tilde{Y}\_{1}) &= \frac{3(2/3)^{2}}{\pi^{2}}\mathbf{e}^{-\text{tr}(\frac{2}{3}Q\_{3})} \end{split}$$

where

$$\begin{aligned} \mathcal{Q}\_3 &= \tilde{Y}\_1^\* A \tilde{Y}\_1 = [\tilde{x}\_{11}^\*, \tilde{x}\_{21}^\*] \begin{bmatrix} 2 & i \\ -i & 2 \end{bmatrix} \begin{bmatrix} \tilde{x}\_{11} \\ \tilde{x}\_{21} \end{bmatrix} \\ &= 2 \tilde{x}\_{11} \tilde{x}\_{11}^\* + 2 \tilde{x}\_{21} \tilde{x}\_{21}^\* + i \tilde{x}\_{21} \tilde{x}\_{11}^\* - i \tilde{x}\_{21}^\* \tilde{x}\_{11}; \end{aligned}$$

$$\begin{split} \tilde{f}\_{p,q\_2}(\tilde{Y}\_2) &= \frac{|\det(A)|^{q\_2} |\det(B\_{22} - B\_{21}B\_{11}^{-1}B\_{12})|^p}{\pi^{pq\_2}} \\ &\times \mathbf{e}^{-\text{tr}[A\tilde{Y}\_2(B\_{22} - B\_{21}B\_{11}^{-1}B\_{12})\tilde{Y}\_2^\*]}, \text{ that is,} \\ \tilde{f}\_{2,2}(\tilde{Y}\_2) &= \frac{3^2}{\pi^4} \mathbf{e}^{-\frac{1}{2}\mathcal{Q}} \end{split}$$

where

<sup>2</sup>*<sup>Q</sup>* <sup>=</sup> tr 2 *i* −*i* 2 *x*˜<sup>12</sup> *x*˜<sup>13</sup> *<sup>x</sup>*˜<sup>22</sup> *<sup>x</sup>*˜<sup>23</sup> 3 1 − 2*i* 1 + 2*i* 3 *x*˜∗ <sup>12</sup> *x*˜<sup>∗</sup> 22 *x*˜∗ <sup>13</sup> *x*˜<sup>∗</sup> 23 = 6*x*˜12*x*˜<sup>∗</sup> <sup>12</sup> + 6*x*˜13*x*˜<sup>∗</sup> <sup>13</sup> + 6*x*˜22*x*˜<sup>∗</sup> <sup>22</sup> + 6*x*˜23*x*˜<sup>∗</sup> 23 + [2*(*1 − 2*i)(x*˜12*x*˜<sup>∗</sup> <sup>13</sup> + ˜*x*22*x*˜<sup>∗</sup> <sup>23</sup>*)* + 2*(*1 + 2*i)(x*˜23*x*˜<sup>∗</sup> <sup>22</sup> + ˜*x*13*x*˜<sup>∗</sup> <sup>12</sup>*)*] + [*i(*1 − 2*i)(x*˜22*x*˜<sup>∗</sup> <sup>13</sup> − ˜*x*12*x*˜<sup>∗</sup> <sup>23</sup>*)* − *i(*1 + 2*i)(x*˜13*x*˜<sup>∗</sup> <sup>22</sup> − ˜*x*23*x*˜<sup>∗</sup> <sup>12</sup>*)*] + 3*i*[ ˜*x*22*x*˜<sup>∗</sup> <sup>12</sup> + ˜*x*23*x*˜<sup>∗</sup> <sup>13</sup> − ˜*x*12*x*˜<sup>∗</sup> <sup>22</sup> − ˜*x*13*x*˜<sup>∗</sup> <sup>23</sup>]*.*

This completes the computations.

#### **Exercises 4.4**

**4.4.1.** Write down explicitly the density of a *p* × *q* matrix-variate Gaussian for *p* = 3*, q* = 3. Then by integrating out the other variables, obtain the density for the case (1): *p* = 2*, q* = 2; (2): *p* = 2*, q* = 1; (3): *p* = 1*, q* = 2; (4): *p* = 1*, q* = 1. Take the location matrix *M* = *O*. Let *A* and *B* to be general positive definite parameter matrices.

**4.4.2.** Repeat Exercise 4.4.1 for the complex case.

**4.4.3.** Write down the densities obtained in Exercises 4.4.1 and 4.4.2. Then evaluate the marginal densities for *p* = 2*, q* = 2 in both the real and complex domains by partitioning matrices and integrating out by using matrix methods.

**4.4.4.** Let the 2×2 real matrix *A>O* where the first row is *(*1*,* 1*)*. Let the real *B>O* be 3 × 3 where the first row is *(*1*,* −1*,* 1*)*. Complete *A* and *B* with numbers of your choosing so that *A > O,B > O*. Consider a real 2 × 3 matrix-variate Gaussian density with these *A* and *B* as the parameter matrices. Take your own non-null location matrix. Write down the matrix-variate Gaussian density explicitly. Then by integrating out the other variables, either directly or by matrix methods, obtain (1): the 1×3 matrix-variate Gaussian density; (2): the 2 × 2 matrix-variate Gaussian density from your 2 × 3 matrix-variate Gaussian density.

**4.4.5.** Repeat Exercise 4.4.4 for the complex case if the first row of *A* is *(*1*,* 1+*i)* and the first row of *B* is *(*2*,* 1 + *i,* 1 − *i)* where *A* = *A*<sup>∗</sup> *> O* and *B* = *B*<sup>∗</sup> *> O*.

#### **4.5. Conditional Densities in the Real Matrix-variate Gaussian Case**

Consider a real *p* × *q* matrix-variate Gaussian density with the parameters *M* = *O, A > O, B > O*. Let us consider the partition of the *p* × *q* real Gaussian matrix *X* into row sub-matrices as *X* = *X*<sup>1</sup> *X*<sup>2</sup> where *X*<sup>1</sup> is *p*<sup>1</sup> × *q* and *X*<sup>2</sup> is *p*<sup>2</sup> × *q* with *p*<sup>1</sup> + *p*<sup>2</sup> = *p*. We have already established that the marginal density of *X*<sup>2</sup> is

$$f\_{p2,q}(X\_2) = \frac{|A\_{22} - A\_{21}A\_{11}^{-1}A\_{12}|^{\frac{q}{2}}|B|^{\frac{p\_2}{2}}}{(2\pi)^{\frac{p\_2q}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}[(A\_{22} - A\_{21}A\_{11}^{-1}A\_{12})X\_2BX\_2']}.$$

Thus, the conditional density of *X*<sup>1</sup> given *X*<sup>2</sup> is obtained as

$$\begin{split} f\_{p\_1,q}(X\_1|X\_2) = \frac{f\_{p,q}(X)}{f\_{p\_2,q}(X\_2)} = \frac{|A|^{\frac{q}{2}}|B|^{\frac{p}{2}}}{|A\_{22} - A\_{21}A\_{11}^{-1}A\_{12}|^{\frac{q}{2}}|B|^{\frac{p\_2}{2}}} \frac{(2\pi)^{\frac{p\_2q}{2}}}{(2\pi)^{\frac{pq}{2}}} \\ & \qquad \times e^{-\frac{1}{2}[\text{tr}(AXBX')] + \text{tr}[(A\_{22} - A\_{21}A\_{11}^{-1}A\_{12})X\_2BX\_2']}. \end{split}$$

Note that

$$\begin{aligned} AXBX' &= A \begin{pmatrix} X\_1 \\ X\_2 \end{pmatrix} B(X\_1' \ X\_2') = A \begin{bmatrix} X\_1 BX\_1' & X\_1 BX\_2' \\ X\_2 BX\_1' & X\_2 BX\_2' \end{bmatrix} \\ &= \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix} \begin{bmatrix} X\_1 BX\_1' & X\_1 BX\_2' \\ X\_2 BX\_1' & X\_2 BX\_2' \end{bmatrix} = \begin{bmatrix} \alpha & \* \\ \* & \beta \end{bmatrix} \end{aligned}$$

where *α* = *A*11*X*1*BX*- <sup>1</sup> + *A*12*X*2*BX*- <sup>1</sup>*, β* = *A*21*X*1*BX*- <sup>2</sup> + *A*22*X*2*BX*- <sup>2</sup> and the asterisks designate elements that are not utilized in the determination of the trace. Then

$$\text{tr}(AXBX') = \text{tr}(A\_{11}X\_1BX\_1' + A\_{12}X\_2BX\_1') + \text{tr}(A\_{21}X\_1BX\_2' + A\_{22}X\_2BX\_2').$$

Thus the exponent in the conditional density simplifies to

$$\begin{aligned} \operatorname{tr}(A\_{11}X\_1BX\_1') + 2\operatorname{tr}(A\_{12}X\_2BX\_1') + \operatorname{tr}(A\_{22}X\_2BX\_2') - \operatorname{tr}[(A\_{22} - A\_{21}A\_{11}^{-1}A\_{12})X\_2BX\_2')] \\ = \operatorname{tr}(A\_{11}X\_1BX\_1') + 2\operatorname{tr}(A\_{12}X\_2BX\_1') + \operatorname{tr}[A\_{21}A\_{11}^{-1}A\_{12}X\_2BX\_2'] \\ = \operatorname{tr}[A\_{11}(X\_1 + C)B(X\_1 + C)'], \; C = A\_{11}^{-1}A\_{12}X\_2. \end{aligned}$$

We note that *E(X*1|*X*2*)* = −*<sup>C</sup>* = −*A*−<sup>1</sup> <sup>11</sup> *A*12*X*2: the regression of *X*<sup>1</sup> on *X*2, the constant part being |*A*11| *q* <sup>2</sup> |*B*| *p*1 <sup>2</sup> */(*2*π ) <sup>p</sup>*1*<sup>q</sup>* <sup>2</sup> . Hence the following result:

**Theorem 4.5.1.** *If the p* × *q matrix X has a real matrix-variate Gaussian density with the parameter matrices M* = *O, A > O and B>O where A is p* × *p and B is q* × *q and if X is partitioned into row sub-matrices X* = *X*<sup>1</sup> *X*<sup>2</sup> *where X*<sup>1</sup> *is p*<sup>1</sup> × *q and X*<sup>2</sup> *is p*<sup>2</sup> × *q, so that p*<sup>1</sup> + *p*<sup>2</sup> = *p, then the conditional density of X*<sup>1</sup> *given X*2*, denoted by fp*1*,q (X*1|*X*2*), is given by*

$$f\_{p\_1,q}(X\_1|X\_2) = \frac{|A\_{11}|^{\frac{q}{2}}|B|^{\frac{p\_1}{2}}}{(2\pi)^{\frac{p\_1q}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}[A\_{11}(X\_1+C)B(X\_1+C)']} \tag{4.5.1}$$

*where <sup>C</sup>* <sup>=</sup> *<sup>A</sup>*−<sup>1</sup> <sup>11</sup> *A*12*X*<sup>2</sup> *if the location parameter is a null matrix; otherwise C* = −*M*<sup>1</sup> + *A*−<sup>1</sup> <sup>11</sup> *A*12*(X*2−*M*2*) with M partitioned into row sub-matrices M*<sup>1</sup> *and M*2*, M*<sup>1</sup> *being p*1×*q and M*2*, p*<sup>2</sup> × *q.*

**Corollary 4.5.1.** *Let X, X*1*, X*2*, M, M*<sup>1</sup> *and M*<sup>2</sup> *be as defined in Theorem 4.5.1; then, in the real Gaussian case, the conditional expectation of X*<sup>1</sup> *given X*2*, denoted by E(X*1|*X*2*), is*

$$E(X\_1|X\_2) = M\_1 - A\_{11}^{-1} A\_{12} (X\_2 - M\_2). \tag{4.5.2}$$

We may adopt the following general notation to represent a real matrix-variate Gaussian (or normal) density:

$$X \sim N\_{p,q}(M, A, B), \ A > O, \ B > O,\tag{4.5.3}$$

which signifies that the *p* × *q* matrix *X* has a real matrix-variate Gaussian distribution with location parameter matrix *M* and parameter matrices *A>O* and *B>O* where *A* is *p* × *p* and *B* is *q* × *q*. Accordingly, the usual *q*-variate multivariate normal density will be denoted as follows:

$$X\_1 \sim N\_{1,q}(\mu, B), \ B > O \Rightarrow X\_1' \sim N\_q(\mu', B^{-1}), \ B > O,\tag{4.5.4}$$

where *μ* is the location parameter vector, which is the first row of *M*, and *X*<sup>1</sup> is a 1 × *q* row vector consisting of the first row of the matrix *<sup>X</sup>*. Note that *<sup>B</sup>*−<sup>1</sup> <sup>=</sup> Cov*(X*1*)* and the covariance matrix usually appears as the second parameter in the standard notation *Np(*·*,* ·*)*. In this case, the 1 × 1 matrix *A* will be taken as 1 to be consistent with the usual notation in the real multivariate normal case. The corresponding column case will be denoted as follows:

$$Y\_{\mathcal{I}} \sim N\_{p,1}(\mu\_{(\mathcal{I})}, A), \ A > O \Rightarrow Y\_{\mathcal{I}} \sim N\_{p}(\mu\_{(\mathcal{I})}, A^{-1}), \ A > O, \ A^{-1} = \text{Cov}(Y\_{\mathcal{I}}) \tag{4.5.5}$$

where *Y*<sup>1</sup> is a *p* × 1 vector consisting of the first column of *X* and *μ(*1*)* is the first column of *M*. With this partitioning of *X*, we have the following result:

**Theorem 4.5.2.** *Let the real matrices X, M, A and B be as defined in Theorem 4.5.1 and X be partitioned into column sub-matrices as X* = *(Y*<sup>1</sup> *Y*2*) where Y*<sup>1</sup> *is p* × *q*<sup>1</sup> *and Y*<sup>2</sup> *is p* × *q*<sup>2</sup> *with q*<sup>1</sup> + *q*<sup>2</sup> = *q. Let the density of X, the marginal densities of Y*<sup>1</sup> *and Y*<sup>2</sup> *and the conditional density of Y*<sup>1</sup> *given Y*2*, be respectively denoted by fp,q (X), fp,q*<sup>1</sup> *(Y*1*), fp,q*<sup>2</sup> *(Y*2*) and fp,q*<sup>1</sup> *(Y*1|*Y*2*). Then, the conditional density of Y*<sup>1</sup> *given Y*<sup>2</sup> *is*

$$f\_{p,q\_1}(Y\_1|Y\_2) = \frac{|A|^{\frac{q\_1}{2}}|B\_{11}|^{\frac{p}{2}}}{(2\pi)^{\frac{pq\_1}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}[A(Y\_1 - M\_{(1)} + C\_1)B\_{11}(Y\_1 - M\_{(1)} + C\_1)']} \tag{4.5.6}$$

*where A > O, B*<sup>11</sup> *> O and <sup>C</sup>*<sup>1</sup> <sup>=</sup> *(Y*<sup>2</sup> <sup>−</sup>*M*2*)B*21*B*−<sup>1</sup> <sup>11</sup> *, so that the conditional expectation of Y*<sup>1</sup> *given Y*2*, or the regression of Y*<sup>1</sup> *on Y*2*, is obtained as*

$$E(Y\_1|Y\_2) = M\_{(1)} - (Y\_2 - M\_{(2)})B\_{21}B\_{11}^{-1},\ M = (M\_{(1)}\ M\_{(2)}),\tag{4.5.7}$$

*where M(*1*) is p* ×*q*<sup>1</sup> *and M(*2*) is p* ×*q*<sup>2</sup> *with q*<sup>1</sup> +*q*<sup>2</sup> = *q. As well, the conditional density of Y*<sup>2</sup> *given Y*<sup>1</sup> *is the following:*

$$f\_{p,q\_2}(Y\_2|Y\_1) = \frac{|A|^{\frac{q\_2}{2}} |B\_{22}|^{\frac{p}{2}}}{(2\pi)^{\frac{pq\_2}{2}}} \mathbf{e}^{-\frac{1}{2} \text{tr}[A(Y\_2 - M\_{(2)} + C\_2)B\_{22}(Y\_2 - M\_{(2)} + C\_2)']} \tag{4.5.8}$$

*where*

$$M\_{(2)} - C\_2 = M\_{(2)} - (Y\_1 - M\_{(1)})B\_{12}B\_{22}^{-1} = E\{Y\_2|Y\_1\}.\tag{4.5.9}$$

**Example 4.5.1.** Consider a 2 × 3 real matrix *X* = *(xij )* having a real matrix-variate Gaussian distribution with the parameters *M,A > O* and *B>O* where

$$M = \begin{bmatrix} 1 & -1 & 1 \\ 2 & 0 & -1 \end{bmatrix},\ A = \begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix},\ B = \begin{bmatrix} 2 & -1 & 1 \\ -1 & 3 & 0 \\ 1 & 0 & 1 \end{bmatrix}.$$

Let *<sup>X</sup>* be partitioned as *X*<sup>1</sup> *X*<sup>2</sup> =[*Y*1*, Y*2] where *X*<sup>1</sup> = [*x*11*, x*12*, x*13]*, X*<sup>2</sup> = [*x*21*, x*22*, x*23], *Y*<sup>1</sup> = *x*<sup>11</sup> *x*<sup>12</sup> *<sup>x</sup>*<sup>21</sup> *<sup>x</sup>*<sup>22</sup> and *Y*<sup>2</sup> = *x*<sup>13</sup> *x*23 *.* Determine the conditional densities of *X*<sup>1</sup> given *X*2, *X*<sup>2</sup> given *X*1, *Y*<sup>1</sup> given *Y*2, *Y*<sup>2</sup> given *Y*<sup>1</sup> and the conditional expectations *E*[*X*1|*X*2]*, E*[*X*2|*X*1]*, E*[*Y*1|*Y*2] and *E*[*Y*2|*Y*1].

**Solution 4.5.1.** Given the specified partitions of *X*, *A* and *B* are partitioned accordingly as follows:

$$\begin{aligned} A &= \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix}, \ B = \begin{bmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{bmatrix}, \ B\_{11} = \begin{bmatrix} 2 & -1 \\ -1 & 3 \end{bmatrix}, \ B\_{12} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \\\\ B\_{21} &= \begin{bmatrix} 1, 0 \end{bmatrix}, \ B\_{22} = \begin{pmatrix} 1 \end{bmatrix}, \ A\_{11} = \begin{pmatrix} 2 \end{pmatrix}, \ A\_{12} = \begin{pmatrix} 1 \end{pmatrix}, \ A\_{21} = \begin{pmatrix} 1 \end{pmatrix}. \end{aligned}$$

The following numerical results are needed:

$$\begin{aligned} A\_{11} - A\_{12} A\_{22}^{-1} A\_{21} &= 2 - (1)(1/3)(1) = \frac{5}{3} \\ A\_{22} - A\_{21} A\_{11}^{-1} A\_{12} &= 3 - (1)(1/2)(1) = \frac{5}{2} \\ B\_{11} - B\_{12} B\_{22}^{-1} B\_{21} &= \begin{bmatrix} 2 & -1 \\ -1 & 3 \end{bmatrix} - \begin{bmatrix} 1 \\ 0 \end{bmatrix} (1) \begin{bmatrix} 1, 0 \end{bmatrix} = \begin{bmatrix} 1 & -1 \\ -1 & 3 \end{bmatrix} \\ B\_{22} - B\_{21} B\_{11}^{-1} B\_{12} &= 1 - [1, 0](1/5) \begin{bmatrix} 3 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \frac{2}{5}; \end{aligned}$$

<sup>|</sup>*A*11| = <sup>2</sup>*,* <sup>|</sup>*A*22| = <sup>3</sup>*,* <sup>|</sup>*A*| = <sup>5</sup>*,* <sup>|</sup>*A*<sup>11</sup> <sup>−</sup> *<sup>A</sup>*12*A*−<sup>1</sup> <sup>22</sup> *A*21| = 5 3 *,* <sup>|</sup>*A*<sup>22</sup> <sup>−</sup> *<sup>A</sup>*21*A*−<sup>1</sup> <sup>11</sup> *A*12| = 5 2 *,* <sup>|</sup>*B*11| = <sup>5</sup>*,* <sup>|</sup>*B*22| = <sup>1</sup>*,* <sup>|</sup>*B*<sup>11</sup> <sup>−</sup> *<sup>B</sup>*12*B*−<sup>1</sup> <sup>22</sup> *<sup>B</sup>*21| = <sup>2</sup>*,* <sup>|</sup>*B*<sup>22</sup> <sup>−</sup> *<sup>B</sup>*21*B*−<sup>1</sup> <sup>11</sup> *B*12| = 2 5 *,* |*B*| = 2;

$$\begin{aligned} A\_{11}^{-1}A\_{12} &= \frac{1}{2}, \ A\_{22}^{-1}A\_{21} = \frac{1}{3}, \ B\_{22}^{-1}B\_{21} = [1,0] \\ B\_{11}^{-1}B\_{12} &= \frac{1}{5} \begin{bmatrix} 3 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \frac{1}{5} \begin{bmatrix} 3 \\ 1 \end{bmatrix}, \ M\_1 = [1,-1,1], \ M\_2 = [2,0,-1] \\ M\_{(1)} = \begin{bmatrix} 1 & -1 \\ 2 & 0 \end{bmatrix}, \ M\_{(2)} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}. \end{aligned}$$

All the conditional expectations can now be determined. They are

$$E[X\_1|X\_2] = M\_1 - A\_{11}^{-1} A\_{12} (X\_2 - M\_2) = [1, -1, 1] - \frac{1}{2} (\mathbf{x\_{21}} - \mathbf{2}, \mathbf{x\_{22}}, \mathbf{x\_{23}} + 1)$$

$$= [1 - \frac{1}{2} (\mathbf{x\_{21}} - \mathbf{2}), -1 - \frac{1}{2} \mathbf{x\_{22}}, 1 - \frac{1}{2} (\mathbf{x\_{23}} + 1)] \tag{i}$$

$$E[X\_2|X\_1] = M\_2 - A\_{22}^{-1} A\_{21} (X\_1 - M\_1) = [2, 0, -1] - \frac{1}{3} [\mathbf{x}\_{11} - 1, \mathbf{x}\_{12} + 1, \mathbf{x}\_{13} - 1]$$

$$= [2 - \frac{1}{3}(\mathbf{x}\_{11} - 1), -\frac{1}{3}(\mathbf{x}\_{12} + 1), -1 - \frac{1}{3}(\mathbf{x}\_{13} - 1)];\tag{ii}$$

$$\begin{aligned} E[Y\_1|Y\_2] &= M\_{(1)} - (Y\_2 - M\_{(2)})B\_{21}B\_{11}^{-1} = \begin{bmatrix} 1 & -1 \\ 2 & 0 \end{bmatrix} - \frac{1}{5} \begin{bmatrix} \mathbf{x}\_{13} - 1 \\ \mathbf{x}\_{23} + 1 \end{bmatrix} [1, 0] \begin{bmatrix} 3 & 1 \\ 1 & 2 \end{bmatrix} \\ &= \begin{bmatrix} 1 - \frac{3}{5}(\mathbf{x}\_{13} - 1) & -1 - \frac{1}{5}(\mathbf{x}\_{13} - 1) \\ 2 - \frac{3}{5}(\mathbf{x}\_{23} + 1) & -\frac{1}{5}(\mathbf{x}\_{23} + 1) \end{bmatrix} \end{aligned} \tag{iii}$$

$$\begin{aligned} E[Y\_2|Y\_1] &= M\_{(2)} - (Y\_1 - M\_{(1)})B\_{12}B\_{22}^{-1} \\ &= \begin{bmatrix} 1 \\ -1 \end{bmatrix} - \begin{bmatrix} x\_{11} - 1 & x\_{12} + 1 \\ x\_{21} - 2 & x\_{22} \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} \begin{bmatrix} (1) \end{bmatrix} = \begin{bmatrix} 2 - x\_{11} \\ 1 - x\_{21} \end{bmatrix}. \end{aligned} \tag{i\nu}$$

The conditional densities can now be obtained. That of *X*<sup>1</sup> given *X*<sup>2</sup> is

$$f\_{p\_1,q}(X\_1|X\_2) = \frac{|A\_{11}|^{\frac{q}{2}}|B|^{\frac{p\_1}{2}}}{(2\pi)^{\frac{p\_1q}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}[A\_{11}(X\_1 - M\_1 + C)B(X\_1 - M\_1 + C)']}$$

for the matrices *A>O* and *B>O* previously specified; that is,

$$f\_{1,3}(X\_1|X\_2) = \frac{4}{(2\pi)^{\frac{3}{2}}} \mathbf{e}^{-\frac{2}{2}(X\_1 - M\_1 + C)B(X\_1 - M\_1 + C)'} $$

where *M*<sup>1</sup> − *C* = *E*[*X*1|*X*2] is given in *(i)*. The conditional density of *X*2|*X*<sup>1</sup> is the following:

$$f\_{p\_2,q}(X\_2|X\_1) = \frac{|A\_{22}|^{\frac{q}{2}}|B|^{\frac{p\_2}{2}}}{(2\pi)^{\frac{p\_2q}{2}}} \operatorname{e}^{-\frac{1}{2}\text{tr}[A\_{22}(X\_2 - M\_2 + C\_1)B(X\_2 - M\_2 + C\_1)']},$$

that is,

$$f\_{1,3}(X\_2|X\_1) = \frac{(\mathfrak{Z}^{\frac{3}{2}})(\mathfrak{Z}^{\frac{1}{2}})}{(2\pi)^{\frac{3}{2}}} \operatorname{e}^{-\frac{3}{2}(X\_2 - M\_2 + C\_1)B(X\_2 - M\_2 + C\_2)'}$$

where *M*<sup>2</sup> − *C*<sup>2</sup> = *E*[*X*2|*X*1] is given in *(ii)*. The conditional density of *Y*<sup>1</sup> given *Y*<sup>2</sup> is

$$f\_{p,q\_1}(Y\_1|Y\_2) = \frac{|A|^{\frac{q\_1}{2}}|B\_{11}|^{\frac{p}{2}}}{(2\pi)^{\frac{pq\_1}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}[A(Y\_1 - M\_{(1)} + C\_2)B\_{11}(Y\_1 - M\_{(1)} + C\_2)']},$$

that is,

$$f\_{2,2}(Y\_1|Y\_2) = \frac{25}{(2\pi)^2} \mathbf{e}^{-\frac{1}{2}\text{tr}[A(Y\_1 - M\_{(1)} + C\_2)B\_{11}(Y\_1 - M\_{(1)} + C\_2)']}$$

where *M(*1*)* − *C*<sup>1</sup> = *E*[*Y*1|*Y*2] is specified in *(iii)*. Finally, the conditional density of *Y*2|*Y*<sup>1</sup> is the following:

$$f\_{p,q\_2}(Y\_2|Y\_1) = \frac{|A|^{\frac{q\_2}{2}}|B\_{22}|^{\frac{p}{2}}}{(2\pi)^{\frac{pq\_2}{2}}} \operatorname{e}^{-\frac{1}{2}\text{tr}[A(Y\_2 - M\_{(2)} + C\_3)B\_{22}(Y\_2 - M\_{(2)} + C\_3)']},$$

that is,

$$f\_{2,1}(Y\_2|Y\_1) = \frac{\sqrt{5}}{(2\pi)} \operatorname{e}^{-\operatorname{tr}[A(Y\_2 - M\_{(2)} + C\_3)B\_{22}(Y\_2 - M\_{(2)} + C\_3)']}$$

where *M(*2*)* − *C*<sup>3</sup> = *E*[*Y*2|*Y*1] is given in *(iv)*. This completes the computations.

#### **4.5a. Conditional Densities in the Matrix-variate Complex Gaussian Case**

The corresponding distributions in the complex case closely parallel those obtained for the real case. A tilde will be utilized to distinguish them from the real distributions. Thus,

$$
\ddot{X} \sim \ddot{N}\_{p,q}(\ddot{M}, A, B), \ A = A^\* > O, \ B = B^\* > O
$$

will denote a complex *p* × *q* matrix *X*˜ having a *p* × *q* matrix-variate complex Gaussian density. For the 1 × *q* case, that is, the *q*-variate multivariate normal distribution in the complex case, which is obtained from the marginal distribution of the first row of *X*˜ , we have

$$
\tilde{X}\_1 \sim \tilde{N}\_{1,q}(\mu, B), \ B > O, \ \tilde{X}\_1 \sim \tilde{N}\_q(\mu, B^{-1}), \ B^{-1} = \text{Cov}(\tilde{X}\_1),
$$

where *X*˜ <sup>1</sup> is 1 × *q* vector having a *q*-variate complex normal density with *E(X*˜ <sup>1</sup>*)* = *μ*. The case *q* = 1 corresponds to a column vector in *X*˜ , which constitutes a *p* × 1 column vector in the complex domain. Letting it be denoted as *Y*˜ 1, we have

$$
\tilde{Y}\_{\mathbf{l}} \sim \tilde{N}\_{p,\mathbf{l}}(\mu\_{(\mathbf{l})}, A), \ A > O, \ \text{that is,} \ \tilde{Y}\_{\mathbf{l}} \sim \tilde{N}\_{p}(\mu\_{(\mathbf{l})}, A^{-1}), \ A^{-1} = \text{Cov}(\tilde{Y}\_{\mathbf{l}}),
$$

where *μ(*1*)* is the first column of *M*.

**Theorem 4.5a.1.** *Let X*˜ *be p* × *q matrix in the complex domain having a p* × *q matrixvariate complex Gaussian density denoted by f*˜ *p,q(X)*˜ *. Let X*˜ = *X*˜ <sup>1</sup> *X*˜ <sup>2</sup> *be a row partitioning of X*˜ *into sub-matrices where X*˜ <sup>1</sup> *is p*<sup>1</sup> × *q and X*˜ <sup>2</sup> *is p*<sup>2</sup> × *q, with p*<sup>1</sup> + *p*<sup>2</sup> = *p. Then the conditional density of X*˜ <sup>1</sup> *given X*˜ <sup>2</sup> *denoted by f*˜ *<sup>p</sup>*1*,q (X*˜ <sup>1</sup>|*X*˜ <sup>2</sup>*), is given by*

$$\tilde{f}\_{p\_1,q}(\tilde{X}\_1|\tilde{X}\_2) = \frac{|\det(A\_{11})|^q |\det(B)|^{p\_1}}{\pi^{p\_1 q}} \mathbf{e}^{-\text{tr}[A\_{11}(\tilde{X}\_1 - M\_1 + \tilde{C})\mathcal{B}(\tilde{X}\_1 - M\_1 + \tilde{C})^\*]} \tag{4.5a.1}$$

*where <sup>C</sup>*˜ <sup>=</sup> *<sup>A</sup>*−<sup>1</sup> <sup>11</sup> *A*12*(X*˜ <sup>2</sup> − *M*2*), E*[*X*˜ ] = *M* = *M*<sup>1</sup> *M*<sup>2</sup> *, and the regression of X*˜ <sup>1</sup> *on X*˜ <sup>2</sup> *is as follows:*

$$E(\tilde{X}\_1|\tilde{X}\_2) = \begin{cases} M\_1 - A\_{11}^{-1} A\_{12} (\tilde{X}\_2 - M\_2) \text{ if } M = \begin{pmatrix} M\_1 \\ M\_2 \end{pmatrix} \\ -A\_{11}^{-1} A\_{12} \tilde{X}\_2 \text{ if } M = O. \end{cases} \tag{4.5a.2}$$

*Analogously, the conditional density of X*˜ <sup>2</sup> *given X*˜ <sup>1</sup> *is*

$$\tilde{f}\_{p\_2,q}(\tilde{X}\_2|\tilde{X}\_1) = \frac{|\det(A\_{22})|^q |\det(B)|^{p\_2}}{\pi^{p\_2q}} \mathbf{e}^{-\text{tr}[A\_{22}(\tilde{X}\_2 - M\_2 + C\_1)\mathcal{B}(\tilde{X}\_2 - M\_2 + C\_1)^\*]} \tag{4.5a.3}$$

*where <sup>C</sup>*<sup>1</sup> <sup>=</sup> *<sup>A</sup>*−<sup>1</sup> <sup>22</sup> *A*21*(X*˜ <sup>1</sup> − *M*1*), so that the conditional expectation of X*˜ <sup>2</sup> *given X*˜ <sup>1</sup> *or the regression of X*˜ <sup>2</sup> *on X*˜ <sup>1</sup> *is given by*

$$E[\tilde{X}\_2|\tilde{X}\_1] = M\_2 - A\_{22}^{-1} A\_{21} (\tilde{X}\_1 - M\_1). \tag{4.5a.4}$$

**Theorem 4.5a.2.** *Let X*˜ *be as defined in Theorem 4.5a.1. Let X*˜ *be partitioned into column submatrices, that is, X*˜ = *(Y*˜ <sup>1</sup> *Y*˜ <sup>2</sup>*) where Y*˜ <sup>1</sup> *is p*×*q*<sup>1</sup> *and Y*˜ <sup>2</sup> *is p*×*q*2*, with q*1+*q*<sup>2</sup> = *q. Then the conditional density of Y*˜ <sup>1</sup> *given Y*˜ <sup>2</sup>*, denoted by f*˜ *p,q*<sup>1</sup> *(Y*˜ 1|*Y*˜ <sup>2</sup>*) is given by*

$$\tilde{f}\_{p,q\_1}(\tilde{Y}\_1|\tilde{Y}\_2) = \frac{|\det(A)|^{q\_1}|\det(B\_{11})|^p}{\pi^{pq\_1}}\mathbf{e}^{-\text{tr}[A(\tilde{Y}\_1 - M\_{(1)} + \tilde{C}\_{(1)})B\_{11}(\tilde{Y}\_1 - M\_{(1)} + \tilde{C}\_{(1)})]} \tag{4.5a.5}$$

*where C*˜*(*1*)* = *(Y*˜ <sup>2</sup> <sup>−</sup> *M(*2*))B*21*B*−<sup>1</sup> <sup>11</sup> *, and the regression of Y*˜ <sup>1</sup> *on Y*˜ <sup>2</sup> *or the conditional expectation of Y*˜ <sup>1</sup> *given Y*˜ <sup>2</sup> *is given by*

$$E(\tilde{Y}\_1|\tilde{Y}\_2) = M\_{\rm (I)} - (\tilde{Y}\_2 - M\_{\rm (2)}) B\_{21} B\_{11}^{-1} \tag{4.5a.6}$$

*with E*[*X*˜ ] = *M* = [*M(*1*) M(*2*)*] = *E*[*Y*˜ <sup>1</sup> *Y*˜ <sup>2</sup>]*. As well the conditional density of Y*˜ <sup>2</sup> *given Y*˜ <sup>1</sup> *is the following:*

$$\tilde{f}\_{p,q\_2}(\tilde{Y}\_2|\tilde{Y}\_1) = \frac{|\det(A)|^{q\_2}|\det(B\_{22})|^p}{\pi^{pq\_2}} \mathbf{e}^{-\text{tr}[A(\tilde{Y}\_2 - M\_{(2)} + C\_{(2)})B\_{22}(\tilde{Y}\_2 - M\_{(2)} + C\_{(2)})^\*]} \tag{4.5a.7}$$

*where C(*2*)* = *(Y*˜ <sup>1</sup> <sup>−</sup> *M(*1*))B*12*B*−<sup>1</sup> <sup>22</sup> *and the conditional expectation of Y*˜ <sup>2</sup> *given Y*˜ <sup>1</sup> *is then*

$$E[\tilde{Y}\_2|\tilde{Y}\_1] = M\_{(2)} - (\tilde{Y}\_1 - M\_{(1)})B\_{12}B\_{22}^{-1}.\tag{4.5a.8}$$

**Example 4.5a.1.** Consider a 2×3 matrix-variate complex Gaussian distribution with the parameters

$$A = \begin{bmatrix} 2 & i \\ -i & 1 \end{bmatrix}, \ B = \begin{bmatrix} 3 & -i & 0 \\ i & 2 & i \\ 0 & -i & 1 \end{bmatrix}, \ M = E[\tilde{X}] = \begin{bmatrix} 1+i & i & -i \\ i & 2+i & 1-i \end{bmatrix}.$$

Consider the partitioning of *X*˜ = *X*˜ <sup>1</sup> *X*˜ <sup>2</sup> = [*Y*˜ <sup>1</sup> *Y*˜ <sup>2</sup>] where *X*˜ <sup>1</sup> = [˜*x*11*, x*˜12*, x*˜13], *X*˜ <sup>2</sup> = [ ˜*x*21*, x*˜22*, x*˜23], *Y*˜ 1 = *x*˜11 *x*˜21 and *Y*˜ 2 = *x*˜<sup>12</sup> *x*˜<sup>13</sup> *<sup>x</sup>*˜<sup>22</sup> *<sup>x</sup>*˜<sup>23</sup> . Determine the conditional densities of *X*˜ <sup>1</sup>|*X*˜ <sup>2</sup>*, X*˜ <sup>2</sup>|*X*˜ <sup>1</sup>*, Y*˜ 1|*Y*˜ <sup>2</sup> and *Y*˜ 2|*Y*˜ <sup>1</sup> and the corresponding conditional expectations.

**Solution 4.5a.1.** As per the partitioning of *X*˜ , we have the following partitions of *A, B* and *M*:

$$A = \begin{bmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{bmatrix}, B = \begin{bmatrix} B\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{bmatrix}, B\_{22} = \begin{bmatrix} 2 & i \\ -i & 1 \end{bmatrix}, B\_{22}^{-1} = \begin{bmatrix} 1 & -i \\ i & 2 \end{bmatrix}, B\_{21} = \begin{bmatrix} i \\ 0 \end{bmatrix},$$

$$A\_{11} = (2), A\_{12} = (i), A\_{21} = (-i), A\_{22} = (1), B\_{12} = [-i, 0], A\_{11}^{-1} = \frac{1}{2}, A\_{22}^{-1} = 1, B\_{11}^{-1} = \frac{1}{3}.$$

$$\begin{aligned} &A\_{11} - A\_{12}A\_{22}^{-1}A\_{21} = 2 - (i)(1)(-i) = 1, \ |A\_{11} - A\_{12}A\_{22}^{-1}A\_{21}| = 1, \\ &A\_{22} - A\_{21}A\_{11}^{-1}A\_{12} = \frac{1}{2}, \ |A\_{22} - A\_{21}A\_{11}^{-1}A\_{12}| = \frac{1}{2}, \ |A| = 1, \ |B| = 2, \\ &B\_{11} - B\_{12}B\_{22}^{-1}B\_{21} = 3 - [-i \ 0] \begin{bmatrix} 1 & -i \\ i & 2 \end{bmatrix} \begin{bmatrix} i \\ 0 \end{bmatrix} = 2, \ |B\_{11} - B\_{12}B\_{22}^{-1}B\_{21}| = 2, \\ &B\_{22} - B\_{21}B\_{11}^{-1}B\_{12} = \begin{bmatrix} 2 & i \\ -i & 1 \end{bmatrix} - \begin{bmatrix} i \\ 0 \end{bmatrix} (1/3)[-i \ 0] = \begin{bmatrix} \frac{5}{3} & i \\ -i & 1 \end{bmatrix}, \ |B\_{22} - B\_{21}B\_{11}^{-1}B\_{12}| = \frac{2}{3}, \\ &M\_{1} = [1+i, \ i, \ -i], \ M\_{2} = [i, \ 2+i, \ 1-i], \ M\_{1} = \begin{bmatrix} 1+i \\ i \end{bmatrix}, \ M\_{(2)} = \begin{bmatrix} i & -i \\ 2+i & 1-i \end{bmatrix}. \end{aligned}$$

All the conditional expectations can now be determined. They are

$$E[\tilde{X}\_1|\tilde{X}\_2] = M\_1 - A\_{11}^{-1} A\_{12} (\tilde{X}\_2 - M\_2) = [1+i, \ i, \ -i] - \frac{i}{2} (\tilde{X}\_2 - M\_2) \tag{i}$$

$$E[\tilde{X}\_2|\tilde{X}\_1] = M\_2 - A\_{22}^{-1} A\_{21} (\tilde{X}\_1 - M\_1) = [i, \ 2 + i, \ 1 - i] + i(\tilde{X}\_1 - M\_1) \tag{ii}$$

$$E[\tilde{Y}\_1|\tilde{Y}\_2] = M\_{(1)} - (\tilde{Y}\_2 - M\_{(2)})B\_{21}B\_{11}^{-1} = \frac{1}{3} \begin{bmatrix} 2 + i\tilde{\chi}\_{12} \\ -1 + 5i - i\tilde{\chi}\_{22} \end{bmatrix} \tag{iii}$$

$$\begin{aligned} \boldsymbol{E}[\tilde{Y}\_2|\tilde{Y}\_1] &= \boldsymbol{M}\_{(2)} - (\tilde{Y}\_1 - \boldsymbol{M}\_{(1)}) \boldsymbol{B}\_{12} \boldsymbol{B}\_{22}^{-1} \\ &= \begin{bmatrix} i & -i \\ 2+i & 1-i \end{bmatrix} - \begin{bmatrix} \tilde{\boldsymbol{x}}\_{11} - (1+i) \\ \tilde{\boldsymbol{x}}\_{21} - i \end{bmatrix} \begin{bmatrix} -i & 0 \end{bmatrix} \begin{bmatrix} 1 & -i \\ -i & 2 \end{bmatrix} \\ &= \begin{bmatrix} i\tilde{\boldsymbol{x}}\_{11} + 1 & \tilde{\boldsymbol{x}}\_{11} - 1 - 2i \\ -i\tilde{\boldsymbol{x}}\_{21} + i + 3 & \tilde{\boldsymbol{x}}\_{21} + 1 - 2i \end{bmatrix} . \end{aligned} \tag{6}$$

Now, on substituting the above quantities in equations (4.5*a*.1), (4.5*a*.3), (4.5*a*.5) and (4.5*a*.7), the following densities are obtained:

$$\tilde{f}\_{1,3}(\tilde{X}\_1|\tilde{X}\_2) = \frac{2^4}{\pi^3} \mathbf{e}^{-2(\tilde{X}\_1 - E\_1)B(\tilde{X}\_1 - E\_1)^{\*}}$$

where *E*<sup>1</sup> = *E*[*X*˜ <sup>1</sup>|*X*˜ <sup>2</sup>] given in *(i)*;

$$\tilde{f}\_{1,3}(\tilde{X}\_2|\tilde{X}\_1) = \frac{2}{\pi^3} \mathbf{e}^{-(\tilde{X}\_2 - E\_2)B(\tilde{X}\_2 - E\_2)^{\*}}$$

where *E*<sup>2</sup> = *E*[*X*˜ <sup>2</sup>|*X*˜ <sup>1</sup>] given in *(ii)*;

$$\tilde{f}\_{2,1}(\tilde{Y}\_1|\tilde{Y}\_2) = \frac{3^2}{\pi^2} \mathbf{e}^{-3\text{tr}[A(\tilde{Y}\_1 - E\_3)(\tilde{Y}\_1 - E\_3)^\*]}$$

where *E*<sup>3</sup> = *E*[*Y*˜ 1|*Y*˜ <sup>2</sup>] given in *(iii)*;

$$\tilde{f}\_{2,2}(\tilde{Y}\_2|\tilde{Y}\_1) = \frac{1}{\pi^4} \operatorname{e}^{-\operatorname{tr}[A(\tilde{Y}\_2 - E\_4)B\_{22}(\tilde{Y}\_2 - E\_4)^\*]}$$

where *E*<sup>4</sup> = *E*[*Y*˜ 2|*Y*˜ <sup>1</sup>] given in *(iv)*. The exponent in the density of *Y*˜ 1|*Y*˜ <sup>2</sup> can be simplified as follows:

$$\begin{aligned} &-\text{tr}[A(\check{Y}\_{1}-M\_{(1)})B\_{11}(\check{Y}\_{1}-M\_{(1)})^{\*}] = -\mathfrak{X}(\check{Y}\_{1}-M\_{(1)})^{\*}A(\check{Y}\_{1}-M\_{(1)})\\ &= -\mathfrak{X}[(\check{\mathbf{x}}\_{11}-(1+i))^{\*}\ (\check{\mathbf{x}}\_{21}-i)^{\*}] \begin{bmatrix} 2 & i\\ -i & 1 \end{bmatrix} \begin{bmatrix} (\check{\mathbf{x}}\_{11}-(1+i)) \\ (\check{\mathbf{x}}\_{21}-i) \end{bmatrix} \\ &= -\mathfrak{G}(\text{x}\_{111}^{2}+\text{x}\_{112}^{2}) + \frac{1}{2}(\text{x}\_{211}^{2}+\text{x}\_{212}^{2}) + (\text{x}\_{112}\text{x}\_{211}-\text{x}\_{111}\text{x}\_{212}) - 2\mathfrak{x}\_{112}-\text{x}\_{111}-\text{x}\_{211}-\frac{3}{2}) \\ &\text{as written }\tilde{\mathbf{x}}\_{11}-\text{x}\_{111}\pm i\text{x}\_{112} \ \begin{bmatrix} k-1 \ 2 \end{bmatrix} \begin{bmatrix} i-1 \ 2 \end{bmatrix} \begin{bmatrix} -2(\check{\mathbf{x}}\_{11}-\check{\mathbf{x}}\_{21}) + \frac{3}{2} \end{bmatrix} \end{aligned}$$

by writing *<sup>x</sup>*˜*k*<sup>1</sup> <sup>=</sup> *xk*<sup>11</sup> <sup>+</sup> *ixk*12*, k* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*. This completes the computations.

# **4.5.1. Re-examination of the case** *q* = 1

When *q* = 1, we have a *p* × 1 vector-variate or the usual *p*-variate Gaussian density of the form in (4.5.5). Let us consider the real case first. Let the *p* × 1 vector be denoted by *Y*<sup>1</sup> with

$$\begin{aligned} Y\_1 &= \begin{bmatrix} y\_1 \\ \vdots \\ y\_p \end{bmatrix} = \begin{bmatrix} Y\_{(1)} \\ Y\_{(2)} \end{bmatrix}, \ Y\_{(1)} = \begin{bmatrix} y\_1 \\ \vdots \\ y\_{p\_1} \end{bmatrix}, \ Y\_{(2)} = \begin{bmatrix} y\_{p\_1 + 1} \\ \vdots \\ y\_p \end{bmatrix}; \\\ M\_{(1)} &= \begin{bmatrix} M\_{(1)}^{(p\_1)} \\ M\_{(2)}^{(p\_2)} \end{bmatrix}, \ M\_{(1)}^{(p\_1)} = \begin{bmatrix} m\_1 \\ \vdots \\ m\_{p\_1} \end{bmatrix}, \ M\_{(2)}^{(p\_2)} = \begin{bmatrix} m\_{p\_1 + 1} \\ \vdots \\ m\_p \end{bmatrix}, \ E[Y\_1] = M\_{(1)}, p\_1 + p\_2 = p. \end{aligned}$$

Then, from (4.5.2) wherein *q* = 1, we have

$$E[Y\_{(1)}|Y\_{(2)}] = M\_{(1)}^{(p\_1)} - A\_{11}^{-1} A\_{12} (Y\_{(2)} - M\_{(2)}^{(p\_2)}),\tag{4.5.10}$$

with *<sup>A</sup>* <sup>=</sup> *<sup>Σ</sup>*−1, *<sup>Σ</sup>* being the covariance matrix of *<sup>Y</sup>*1, that is, Cov*(Y*1*)* <sup>=</sup> *<sup>E</sup>*[*(Y*<sup>1</sup> <sup>−</sup> *E(Y*1*))(Y*<sup>1</sup> − *E(Y*1*))*- ]. Let

$$A^{-1} = \Sigma = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \text{ where } \Sigma\_{11} \text{ is } p\_1 \times p\_1 \text{ and } \Sigma\_{22} \text{ is } p\_2 \times p\_2.$$

From the partitioning of matrices presented in Sect. 1.3, we have

$$-A\_{11}^{-1}A\_{12} = A^{12}(A^{22})^{-1} = \Sigma\_{12}\Sigma\_{22}^{-1}.\tag{4.5.11}$$

Accordingly, we may rewrite (4.5.10) in terms of the sub-matrices of the covariance matrix as

$$E[Y\_{(\rm l)}|Y\_{(2)}] = M\_{(\rm l)}^{(p\_1)} + \Sigma\_{\rm l2} \Sigma\_{22}^{-1} (Y\_{(2)} - M\_{(2)}^{(p\_2)}).\tag{4.5.12}$$

If *p*<sup>1</sup> = 1, then *Y(*2*)* will contain *p* − 1 elements, denoted by *Y* - *(*2*)* = *(y*2*,...,yp)*. Letting *E*[*y*1] = *m*1, we have

$$E[\mathbf{y}\_1|Y\_{(2)}] = m\_1 + \Sigma\_{12} \Sigma\_{22}^{-1} (Y\_{(2)} - M\_{(2)}^{(p\_2)}), \ p\_2 = p - 1. \tag{4.5.13}$$

The conditional expectation (4.5.13) is *the best predictor of y*<sup>1</sup> *at the preassigned values of <sup>y</sup>*2*,...,yp*, where *<sup>m</sup>*<sup>1</sup> <sup>=</sup> *<sup>E</sup>*[*y*1]. It will now be shown that *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> can be expressed in terms of variances and correlations. Let *σ*<sup>2</sup> *<sup>j</sup>* = *σjj* = Var*(yj )* where Var*(*·*)* denotes the variance of *(*·*)*. Note that *σij* = Cov*(yi, yj )* or the covariance between *y*<sup>1</sup> and *yj* . Letting *ρij* be the correlation between *yi* and *yj* , we have

$$\begin{split} \Sigma\_{12} &= [\text{Cov}(\mathbf{y}\_1, \mathbf{y}\_2), \dots, \text{Cov}(\mathbf{y}\_1, \mathbf{y}\_p)], \\ &= [\sigma\_1 \sigma\_2 \rho\_{12}, \dots, \sigma\_1 \sigma\_p \rho\_{1p}]. \end{split}$$

Then

$$
\Sigma = \begin{bmatrix}
\sigma\_1 \sigma\_1 & \sigma\_1 \sigma\_2 \rho\_{12} & \cdots & \sigma\_1 \sigma\_p \rho\_{1p} \\
\sigma\_2 \sigma\_1 \rho\_{21} & \sigma\_2 \sigma\_2 & \cdots & \sigma\_2 \sigma\_p \rho\_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
\sigma\_p \sigma\_1 \rho\_{p1} & \sigma\_p \sigma\_2 \rho\_{p2} & \cdots & \sigma\_p \sigma\_p
\end{bmatrix}, \ \rho\_{ij} = \rho\_{ji}, \ \rho\_{jj} = 1,
$$

for all *j* . Let *D* = diag*(σ*1*,...,σp)* be a diagonal matrix whose diagonal elements are *σ*1*,...,σp*, the standard deviations of *y*1*,...,yp,* respectively. Letting *R* = *(ρij )* = denote the correlation matrix wherein *ρij* is the correlation between *yi* and *yj* , we can express *Σ* as *DRD*, that is,

*Σ* = ⎡ ⎢ ⎢ ⎢ ⎣ *σ*<sup>11</sup> *σ*<sup>12</sup> ··· *σ*1*<sup>p</sup> σ*<sup>21</sup> *σ*<sup>22</sup> ··· *σ*2*<sup>p</sup> . . . . . . ... . . . σp*<sup>1</sup> *σp*<sup>2</sup> ··· *σpp* ⎤ ⎥ ⎥ ⎥ ⎦= ⎡ ⎢ ⎢ ⎢ ⎣ *σ*<sup>1</sup> 0 ··· 0 0 *σ*<sup>2</sup> ··· 0 *. . . . . . ... . . .* 0 0 ··· *σp* ⎤ ⎥ ⎥ ⎥ ⎦ ⎡ ⎢ ⎢ ⎢ ⎣ 1 *ρ*<sup>12</sup> ··· *ρ*1*<sup>p</sup> ρ*<sup>21</sup> 1 ··· *ρ*2*<sup>p</sup> . . . . . . ... . . . ρp*<sup>1</sup> *ρp*<sup>2</sup> ··· 1 ⎤ ⎥ ⎥ ⎥ ⎦ ⎡ ⎢ ⎢ ⎢ ⎣ *σ*<sup>1</sup> 0 ··· 0 0 *σ*<sup>2</sup> ··· 0 *. . . . . . ... . . .* 0 0 ··· *σp* ⎤ ⎥ ⎥ ⎥ ⎦

so that

$$
\Sigma^{-1} = D^{-1} R^{-1} D^{-1}, \ p = 2, 3, \dots \tag{4.5.14}
$$

We can then re-express (4.5.13) in terms of variances and correlations since

$$
\Delta\_{12} \Sigma\_{22}^{-1} = \sigma\_1 R\_{12} D\_{(2)} D\_{(2)}^{-1} R\_{22}^{-1} D\_{(2)}^{-1} = \sigma\_1 R\_{12} R\_{22}^{-1} D\_{(2)}^{-1}
$$

where *D(*2*)* = diag*(σ*2*,...,σp)* and *R* is partitioned accordingly. Thus,

$$E\left[\mathbf{y}\_{1}|Y\_{(2)}\right] = m\_{1} + \sigma\_{1}R\_{12}R\_{22}^{-1}D\_{(2)}^{-1}(Y\_{(2)} - M\_{(2)}^{(p\_{2})}).\tag{4.5.15}$$

An interesting particular case occurs when *p* = 2, as there are then only two real scalar variables *y*<sup>1</sup> and *y*2, and

$$E\{\mathbf{y}\_1|\mathbf{y}\_2\} = m\_1 + \frac{\sigma\_1}{\sigma\_2} \rho\_{12} (\mathbf{y}\_2 - m\_2),\tag{4.5.16}$$

which is the *regression of y*<sup>1</sup> *on y*<sup>2</sup> *or the best predictor of y*<sup>1</sup> *at a given value of y*2.

#### **4.6. Sampling from a Real Matrix-variate Gaussian Density**

Let the *p* × *q* matrix *Xα* = *(xijα)* have a *p* × *q* real matrix-variate Gaussian density with parameter matrices *M, A > O* and *B>O*. When *n* independently and identically distributed (iid) matrix random variables that are distributed as *Xα* are available, we say that we have *a simple random sample of size n from Xα or from the population distributed as Xα*. We will consider simple random samples from a *p*×*q* matrix-variate Gaussian population in the real and complex domains. Since the procedures are parallel to those utilized in the vector variable case, we will recall the particulars in connection with that particular case. Some of the following materials are re-examinations of those already presented Chap. 3. For *q* = 1*,* we have a *p*-vector which will be denoted by *Y*1. In our previous notations, *Y*<sup>1</sup> is the same *Y*<sup>1</sup> for *q*<sup>1</sup> = 1*, q*<sup>2</sup> = 0 and *q* = 1. Consider a sample of size *n* from a population distributed as *Y*<sup>1</sup> and let the *p* × *n* sample matrix be denoted by **Y**. Then,

$$\mathbf{Y} = [Y\_1, \dots, Y\_n] = \begin{bmatrix} \mathbf{y}\_{11} & \mathbf{y}\_{12} & \cdots & \mathbf{y}\_{1n} \\ \mathbf{y}\_{21} & \mathbf{y}\_{22} & \cdots & \mathbf{y}\_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{y}\_{p1} & \mathbf{y}\_{p2} & \cdots & \mathbf{y}\_{pn} \end{bmatrix}, \ Y\_1 = \begin{bmatrix} \mathbf{y}\_{11} \\ \vdots \\ \mathbf{y}\_{p1} \end{bmatrix}.$$

In this case, the columns of **Y**, that is, *Yj* , *j* = 1*, . . . , n,* are iid variables, distributed as *Y*1. Let an *n* × 1 column vector whose components are all equal to 1 be denoted by *J* and consider

$$\bar{Y} = \frac{1}{n} \mathbf{Y} \\
J = \frac{1}{n} \begin{bmatrix} \mathbf{y}\_{11} & \cdots & \mathbf{y}\_{1n} \\ \mathbf{y}\_{21} & \cdots & \mathbf{y}\_{2n} \\ \vdots & \ddots & \vdots \\ \mathbf{y}\_{p1} & \cdots & \mathbf{y}\_{pn} \end{bmatrix} \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix} = \begin{bmatrix} \bar{\mathbf{y}}\_{1} \\ \bar{\mathbf{y}}\_{2} \\ \vdots \\ \bar{\mathbf{y}}\_{p} \end{bmatrix}$$

where *y*¯*<sup>j</sup>* = *<sup>n</sup> <sup>k</sup>*=<sup>1</sup> *yjk <sup>n</sup>* denotes the average of the variables, distributed as *yj* . Let

$$S = (\mathbf{Y} - \bar{\mathbf{Y}})(\mathbf{Y} - \bar{\mathbf{Y}}) \text{ where the bold-fixed } \bar{\mathbf{Y}} = \begin{bmatrix} \bar{\mathbf{y}}\_1 & \cdots & \bar{\mathbf{y}}\_1 \\ \bar{\mathbf{y}}\_2 & \cdots & \bar{\mathbf{y}}\_2 \\ \vdots & \ddots & \vdots \\ \bar{\mathbf{y}}\_p & \cdots & \bar{\mathbf{y}}\_p \end{bmatrix} = [\bar{\mathbf{Y}}, \dots, \bar{\mathbf{Y}}].$$

Then,

$$S = (\mathbf{s}\_{ij}), \ s\_{ij} = \sum\_{k=1}^{n} (\mathbf{y}\_{ik} - \mathbf{\bar{y}}\_{i})(\mathbf{y}\_{jk} - \mathbf{\bar{y}}\_{j}) \text{ for all } i \text{ and } j. \tag{4.6.1}$$

This matrix *S* is known as the *sample sum of products matrix or corrected sample sum of products matrix*. Here "corrected" indicates that the deviations are taken from the respective averages *<sup>y</sup>*¯1*,..., <sup>y</sup>*¯*p*. Note that <sup>1</sup> *<sup>n</sup>sij* is equal to the sample covariance between *yi* and *yj* and when *i* = *j* , it is the sample variance of *yi*. Observing that

$$J = \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix} \Rightarrow JJ' = \begin{bmatrix} 1 & 1 & \cdots & 1 \\ 1 & 1 & \cdots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \cdots & 1 \end{bmatrix} \text{ and } J'J = n, J$$

we have

$$\mathbf{Y}(\frac{1}{n}JJ') = \bar{Y} \Rightarrow \mathbf{Y} - \bar{\mathbf{Y}} = \mathbf{Y}[I - \frac{1}{n}JJ'].$$

Hence

$$S = (\mathbf{Y} - \bar{\mathbf{Y}})(\mathbf{Y} - \bar{\mathbf{Y}})' = \mathbf{Y}[I - \frac{1}{n}JJ'][I - \frac{1}{n}JJ']\mathbf{Y}'I$$

However,

$$\begin{aligned} [I - \frac{1}{n}JJ'][I - \frac{1}{n}JJ']' &= I - \frac{1}{n}JJ' - \frac{1}{n}JJ' + \frac{1}{n^2}JJ'JJ' \\ &= I - \frac{1}{n}JJ' \text{ since } J'J = n. \end{aligned}$$

Thus,

$$S = \mathbf{Y}[I - \frac{1}{n}JJ']\mathbf{Y}'.\tag{4.6.2}$$

Letting *<sup>C</sup>*<sup>1</sup> <sup>=</sup> *(I* <sup>−</sup> <sup>1</sup> *<sup>n</sup>J J* - *)*, we note that *C*<sup>2</sup> <sup>1</sup> = *C*<sup>1</sup> and that the rank of *C*<sup>1</sup> is *n* − 1. Accordingly, *C*<sup>1</sup> is an idempotent matrix having *n*−1 eigenvalues equal to 1, the remaining one being equal to zero. Now, letting *<sup>C</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>J J* - , it is easy to verify that *C*<sup>2</sup> <sup>2</sup> = *C*<sup>2</sup> and that the rank of *C*<sup>2</sup> is one; thus, *C*<sup>2</sup> is idempotent with *n* − 1 eigenvalues equal to zero, the remaining one being equal to 1. Further, since *C*1*C*<sup>2</sup> = *O*, that is, *C*<sup>1</sup> and *C*<sup>2</sup> are orthogonal to each other, **Y** − **Y**¯ = **Y***C*<sup>1</sup> and **Y**¯ = **Y***C*<sup>2</sup> are independently distributed, so that **Y** − **Y**¯ and *Y*¯ are independently distributed. Consequently, *S* = *(***Y** − **Y**¯ *)(***Y** − **Y**¯ *)* and *Y*¯ are independently distributed as well. This will be stated as the next result.

**Theorem 4.6.1, 4.6a.1.** *Let Y*1*,...,Yn be a simple random sample of size n from a pvariate real Gaussian population having a Np(μ, Σ), Σ > O, distribution. Let Y*¯ *be the sample average and S be the sample sum of products matrix; then, Y*¯ *and S are statistically independently distributed. In the complex domain, let the Y*˜ *<sup>j</sup> 's be iid Np(μ,*˜ *Σ),* ˜ *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O, and* ¯ *Y*˜ *and S*˜ *denote the sample average and sample sum of products matrix; then,* ¯ *Y*˜ *and S*˜ *are independently distributed.*

#### **4.6.1. The distribution of the sample sum of products matrix, real case**

Reprising the notations of Sect. 4.6, let the *p* × *n* matrix **Y** denote a sample matrix whose columns *Y*1*,...,Yn* are iid as *Np(μ, Σ), Σ > O*, Gaussian vectors. Let the sample mean be *<sup>Y</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(Y*<sup>1</sup> + ··· + *Yn)* <sup>=</sup> <sup>1</sup> *<sup>n</sup>***Y***J* where *J* - = *(*1*,...,* 1*)*. Let the bold-faced matrix **<sup>Y</sup>**¯ = [*Y,...,* ¯ *<sup>Y</sup>*¯] = **<sup>Y</sup>***C*<sup>1</sup> where *<sup>C</sup>*<sup>1</sup> <sup>=</sup> *In* <sup>−</sup> <sup>1</sup> *<sup>n</sup>J J* - . Note that *<sup>C</sup>*<sup>1</sup> <sup>=</sup> *In* <sup>−</sup> *<sup>C</sup>*<sup>2</sup> <sup>=</sup> *<sup>C</sup>*<sup>2</sup> 1 and *<sup>C</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>J J* - <sup>=</sup> *<sup>C</sup>*<sup>2</sup> <sup>2</sup> , that is, *C*<sup>1</sup> and *C*<sup>2</sup> are idempotent matrices whose respective ranks are *n* − 1 and 1. Since *C*<sup>1</sup> = *C*- <sup>1</sup>, there exists an *n* × *n* orthonormal matrix *P*, *P P*- = *In, P*- *P* = *In*, such that *P*- *C*1*P* = *D* where

$$D = \begin{bmatrix} I\_{n-1} & O \\ O & 0 \end{bmatrix} = P'C\_1P.$$

Let **Y** = *ZP* where *Z* is *p* × *n*. Then, **Y** = *ZP*- ⇒ **Y***C*<sup>1</sup> = *ZP*- *C*<sup>1</sup> = *ZP*- *PDP*- = *ZDP*- , so that

$$\mathbf{S} = (\mathbf{Y}\mathbf{C}\_1)(\mathbf{Y}\mathbf{C}\_1)' = \mathbf{Y}\mathbf{C}\_1\mathbf{C}\_1'\mathbf{Y}' = Z \begin{bmatrix} I\_{n-1} & O \\ O & 0 \end{bmatrix} \begin{bmatrix} I\_{n-1} & O \\ O & 0 \end{bmatrix} \mathbf{Z}' $$
 
$$= (\mathbf{Z}\_{n-1}, O)(\mathbf{Z}\_{n-1}, O)' = \mathbf{Z}\_{n-1}\mathbf{Z}\_{n-1}' \tag{4.6.3}$$

where *Zn*−<sup>1</sup> is a *p*×*(n*−1*)* matrix obtained by deleting the last column of the *p*×*n* matrix *Z*. Thus, *S* = *Zn*−1*Z*- *<sup>n</sup>*−<sup>1</sup> where *Zn*−<sup>1</sup> contains *p(n* <sup>−</sup> <sup>1</sup>*)* distinct real variables. Accordingly, Theorems 4.2.1, 4.2.2, 4.2.3, and the analogous results in the complex domain, are applicable to *Zn*−<sup>1</sup> as well as to the corresponding quantity *Z*˜ *<sup>n</sup>*−<sup>1</sup> in the complex case. Observe that when *Y*<sup>1</sup> ∼ *Np(μ, Σ)*, **Y**−**Y**¯ has expected value **M**−**M** = *O*, **M** = *(μ, . . . , μ)*. Hence, **Y**−**Y**¯ = *(***Y**−**M***)*−*(***Y**¯ −**M***)* and therefore, without any loss of generality, we can assume *Y*<sup>1</sup> to be coming from a *Np(O, Σ), Σ > O,* vector random variable whenever **Y** − **Y**¯ is involved.

**Theorem 4.6.2.** *Let* **Y***, Y ,*¯ **Y**¯ *, J, C*<sup>1</sup> *and C*<sup>2</sup> *be as defined in this section. Then, the p* × *n matrix (***Y** − **Y**¯ *)J* = *O, which implies that there exist linear relationships among the columns of* **Y***. However, all the elements of Zn*−<sup>1</sup> *as defined in (4.6.3) are distinct real variables. Thus, Theorems 4.2.1, 4.2.2 and 4.2.3 are applicable to Zn*−1*.*

Note that the corresponding result for the complex Gaussian case also holds.

#### **4.6.2. Linear functions of sample vectors**

Let *Yj iid* ∼ *Np(μ, Σ), Σ > O, j* = 1*,...,n*, or equivalently, let the *Yj* 's constitutes a simple random sample of size *n* from this *p*-variate real Gaussian population. Then, the density of the *p* × *n* sample matrix **Y**, denoted by *L(***Y***)*, is the following:

$$L(\mathbf{Y}) = \frac{1}{(2\pi)^{\frac{np}{2}} |\boldsymbol{\Sigma}|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2} \text{tr}[\boldsymbol{\Sigma}^{-1}(\mathbf{Y}-\mathbf{M})(\mathbf{Y}-\mathbf{M})']},$$

where **M** = *(μ, . . . , μ)* is *p*×*n* whose columns are all equal to the *p*×1 parameter vector *μ*. Consider a linear function of the sample values *Y*1*,...,Yn*. Let the linear function be *U* = **Y***A* where *A* is an *n* × *q* constant matrix of rank *q*, *q* ≤ *p* ≤ *n*, so that *U* is *p* × *q*. Let us consider the mgf of *U*. Since *U* is *p* ×*q*, we employ a *q* ×*p* parameter matrix *T* so that tr*(T U )* will contain all the elements in *U* multiplied by the corresponding parameters. The mgf of *U* is then

$$\begin{aligned} M\_U(T) &= E[\mathbf{e}^{\text{tr}(TU)}] = E[\mathbf{e}^{\text{tr}(T\mathbf{Y}A)}] = E[\mathbf{e}^{\text{tr}(AT\mathbf{Y})}] \\ &= \mathbf{e}^{\text{tr}(AT\mathbf{M})} E[\mathbf{e}^{\text{tr}(AT(\mathbf{Y}-\mathbf{M}))}] \end{aligned}$$

where **<sup>M</sup>** <sup>=</sup> *(μ, . . . , μ)*. Letting *<sup>W</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup> *(***Y** − **M***)*, d**Y** = |*Σ*| *n* <sup>2</sup> d*W* and

$$\begin{split} M\_{U}(T) &= \operatorname{\mathbf{e}}^{\operatorname{tr}(AT\mathbf{M})} |\boldsymbol{\Sigma}|^{\frac{n}{2}} E[\mathbf{e}^{\operatorname{tr}(AT\boldsymbol{\Sigma}^{\frac{1}{2}}W)}] \\ &= \frac{\operatorname{\mathbf{e}}^{\operatorname{tr}(AT\mathbf{M})}}{(2\pi)^{\frac{np}{2}}} \int\_{W} \operatorname{\mathbf{e}}^{\operatorname{tr}(AT\boldsymbol{\Sigma}^{\frac{1}{2}}W) - \frac{1}{2}\operatorname{tr}(WW')} \operatorname{\mathbf{d}}W. \end{split}$$

Now, expanding

$$\text{tr}[(W-C)(W-C)'] = \text{tr}(WW') - 2\text{tr}(WC') + \text{tr}(CC').$$

and comparing the resulting expression with the exponent in the integrand, which excluding <sup>−</sup><sup>1</sup> <sup>2</sup> , is tr*(WW*- *)* − 2tr*(AT Σ* 1 <sup>2</sup>*W )*, we may let *C*- = *AT Σ* 1 <sup>2</sup> so that tr*(CC*- *)* = tr*(AT ΣT* - *A*- *)* = tr*(T ΣT* - *A*- *A)*. Since tr*(AT* **M***)* = tr*(T* **M***A)* and

$$\frac{1}{(2\pi)^{\frac{np}{2}}} \int\_W \mathbf{e}^{-\frac{1}{2}((W-C)(W-C)')} \mathrm{d}W = 1,$$

we have

$$M\_U(T) = M\_{\mathbf{Y}A}(T) = \mathbf{e}^{\text{tr}(T\mathbf{M}A) + \frac{1}{2}\text{tr}(T\,\Sigma T^{'A}A)}$$

where **M***A* = *E*[**Y***A*]*, Σ > O, A*- *A > O, A* being a full rank matrix, and *TΣT* - *A*- *A* is a *q* × *q* positive definite matrix. Hence, the *p* × *q* matrix *U* = **Y***A* has a matrix-variate

real Gaussian density with the parameters **M***A* = *E*[**Y***A*] and *A*- *A > O, Σ > O*. Thus, the following result:

**Theorem 4.6.3, 4.6a.2.** *Let Yj iid* ∼ *Np(μ, Σ), Σ > O, j* = 1*,...,n, or equivalently, let the Yj 's constitutes a simple random sample of size n from this p-variate real Gaussian population. Consider a set of linear functions of Y*1*,...,Yn, U* = **Y***A where* **Y** = *(Y*1*,...,Yn) is a p* × *n sample matrix and A is an n* × *q constant matrix of rank q, q* ≤ *p* ≤ *n. Then, U has a nonsingular p* × *q matrix-variate real Gaussian distribution with the parameters* **M***A* = *E*[**Y***A*]*, A*- *A > O, and Σ>O. Analogously, in the complex domain, U*˜ = **Y**˜ *A is a p* × *q-variate complex Gaussian distribution with the corresponding parameters E*[**Y**˜ *A*]*, A*∗*A > O, and Σ>O* ˜ *, A*<sup>∗</sup> *denoting the conjugate transpose of A. In the usual format of a p* × *q matrix-variate Np,q(M, A, B) real Gaussian density, M is replaced by MA, A, by A*- *A and B, by Σ, in the real case, with corresponding changes for the complex case.*

A certain particular case turns out to be of interest. Observe that **M***A* = *μ(J* - *A), J* - = *(*1*,...,* 1*),* and that when *q* = 1, we are considering only one linear combination of *Y*1*,...,Yn* in the form *U*<sup>1</sup> = *a*1*Y*1+···+*anYn,* where *a*1*,...,an* are real scalar constants. Then *J* - *<sup>A</sup>* <sup>=</sup> *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj , A*- *<sup>A</sup>* <sup>=</sup> *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup>* , and the *p* × 1 vector *U*<sup>1</sup> has a *p*-variate real nonsingular Gaussian distribution with the parameters *( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj )μ* and *( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup> )Σ*. This result was stated in Theorem 3.5.4.

**Corollary 4.6.1, 4.6a.1.** *Let A as defined in Theorem 4.6.3 be n* × 1*, in which case A is a column vector whose components are a*1*,...,an, and the resulting single linear function of Y*1*,...,Yn is U*<sup>1</sup> = *a*1*Y*<sup>1</sup> +···+ *anYn. Let the population be p-variate real Gaussian with the parameters μ and Σ>O. Then U*<sup>1</sup> *has a p-variate nonsingular real normal distribution with the parameters ( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj )μ and ( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *<sup>j</sup> )Σ. Analogously, in the complex Gaussian population case, U*˜<sup>1</sup> = *a*1*Y*˜ <sup>1</sup>+···+*anY*˜ *<sup>n</sup> is distributed as a complex Gaussian with mean value ( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *aj )μ*˜ *and covariance matrix ( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>∗</sup> *<sup>j</sup> aj )Σ*˜ *. Taking <sup>a</sup>*<sup>1</sup> = ··· = *an* <sup>=</sup> <sup>1</sup> *<sup>n</sup>, <sup>U</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>(Y*<sup>1</sup> +···+ *Yn)* = *Y*¯*, the sample average, which has a p-variate real Gaussian density with the parameters μ and* <sup>1</sup> *<sup>n</sup>Σ. Correspondingly, in the complex Gaussian case, the sample average* ¯ *Y*˜ *is a p-variate complex Gaussian vector with the parameters <sup>μ</sup>*˜ *and* <sup>1</sup> *<sup>n</sup>Σ ,* ˜ *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O.*

#### **4.6.3. The general real matrix-variate case**

In order to avoid a multiplicity of symbols, we will denote the *p*×*q* real matrix-variate random variable by *Xα* = *(xijα)* and the corresponding complex matrix by *X*˜ *<sup>α</sup>* = *(x*˜*ijα)*. Consider a simple random sample of size *n* from the population represented by the real *p* × *q* matrix *Xα* = *(xijα)*. Let *Xα* = *(xijα)* be the *α*-th sample value, so that the *Xα*'s, *α* = 1*, . . . , n,* are iid as *X*1. Let the *p* × *nq* sample matrix be denoted by the bold-faced **X** = [*X*1*, X*2*,...,Xn*] where each *Xj* is *p* × *q*. Let the sample average be denoted by *<sup>X</sup>*¯ <sup>=</sup> *(x*¯*ij ), <sup>x</sup>*¯*ij* <sup>=</sup> <sup>1</sup> *n <sup>n</sup> <sup>α</sup>*=<sup>1</sup> *xijα*. Let **Xd** be the sample deviation matrix which is the *p* × *qn* matrix

$$\mathbf{X\_{d}} = [X\_{1} - \bar{X}, X\_{2} - \bar{X}, \dots, X\_{n} - \bar{X}], \ X\_{a} - \bar{X} = (x\_{ija} - \bar{x}\_{lj}), \tag{4.6.4}$$

wherein the corresponding sample average is subtracted from each element. For example,

$$\begin{aligned} X\_{\alpha} - \bar{X} &= \begin{bmatrix} x\_{11\alpha} - \bar{x}\_{11} & x\_{12\alpha} - \bar{x}\_{12} & \cdots & x\_{1q\alpha} - \bar{x}\_{1q} \\ x\_{21\alpha} - \bar{x}\_{21} & x\_{22\alpha} - \bar{x}\_{22} & \cdots & x\_{2q\alpha} - \bar{x}\_{2q} \\ \vdots & \vdots & \ddots & \vdots \\ x\_{p1\alpha} - \bar{x}\_{p1} & x\_{p2\alpha} - \bar{x}\_{p2} & \cdots & x\_{pq\alpha} - \bar{x}\_{pq} \end{bmatrix} \\ &= \begin{bmatrix} C\_{1\alpha} & C\_{2\alpha} & \cdots & C\_{qq} \end{bmatrix} \end{aligned} \tag{i}$$

where *Cjα* is the *j* -th column in the *α*-th sample deviation matrix *Xα* −*X*¯ . In this notation, the *p* × *qn* sample deviation matrix can be expressed as follows:

$$\mathbf{X\_{d}} = [\mathbf{C\_{11}}, \mathbf{C\_{21}}, \dots, \mathbf{C\_{q1}}, \mathbf{C\_{12}}, \mathbf{C\_{22}}, \dots, \mathbf{C\_{q2}}, \dots, \mathbf{C\_{ln}}, \mathbf{C\_{2n}}, \dots, \mathbf{C\_{qn}}] \tag{ii}$$

where, for example, *Cγ α* denotes the *γ* -th column in the *α*-th *p* × *q* matrix, *Xα* − *X*¯ , that is,

$$C\_{\mathcal{Y}^{\alpha}} = \begin{bmatrix} x\_{1\mathcal{Y}^{\alpha}} - \bar{x}\_{1\mathcal{Y}} \\ x\_{2\mathcal{Y}^{\alpha}} - \bar{x}\_{2\mathcal{Y}} \\ \vdots \\ x\_{p\mathcal{Y}^{\alpha}} - \bar{x}\_{p\mathcal{Y}} \end{bmatrix}. \tag{iiii}$$

Then, the sample sum of products matrix, denoted by *S*, is given by

$$\mathbf{S} = \mathbf{X\_d}\mathbf{X\_d}' = C\_{11}C\_{11}' + C\_{21}C\_{21}' + \dots + C\_{q1}C\_{q1}'$$

$$+ C\_{12}C\_{12}' + C\_{22}C\_{22}' + \dots + C\_{q2}C\_{q2}'$$

$$\vdots$$

$$+ C\_{1n}C\_{1n}' + C\_{2n}C\_{2n}' + \dots + C\_{qn}C\_{qn}'.\tag{iv}$$

Let us rearrange these matrices by collecting the terms relevant to each column of **X** which are

$$
\begin{bmatrix} \boldsymbol{\chi\_{11}} \\ \boldsymbol{\chi\_{21}} \\ \vdots \\ \boldsymbol{\chi\_{p1}} \end{bmatrix}, \begin{bmatrix} \boldsymbol{\chi\_{12}} \\ \boldsymbol{\chi\_{22}} \\ \vdots \\ \boldsymbol{\chi\_{p2}} \end{bmatrix}, \dots, \begin{bmatrix} \boldsymbol{\chi\_{1q}} \\ \boldsymbol{\chi\_{2q}} \\ \vdots \\ \boldsymbol{\chi\_{pq}} \end{bmatrix}.
$$

Then, the terms relevant to these columns are the following:

$$\begin{aligned} S = \mathbf{X\_d} \mathbf{X\_d}' &= C\_{11}C\_{11}' + C\_{21}C\_{21}' + \dots + C\_{q1}C\_{q1}' \\ &+ C\_{12}C\_{12}' + C\_{22}C\_{22}' + \dots + C\_{q2}C\_{q2}' \\ &\vdots \\ &+ C\_{1n}C\_{1n}' + C\_{2n}C\_{2n}' + \dots + C\_{qn}C\_{qn}' \\ &\equiv S\_1 + S\_2 + \dots + S\_q \end{aligned} \tag{v}$$

where *S*<sup>1</sup> denotes the *p* × *p* sample sum of products matrix in the first column of **X**, *S*2*,* the *p* × *p* sample sum of products matrix corresponding to the second column of **X**, and so on, *Sq* being equal to the *p* × *p* sample sum of products matrix corresponding to the *q*-th column of **X**.

**Theorem 4.6.4.** *Let Xα* = *(xijα) be a real p* × *q matrix of distinct real scalar variables xijα's. Letting Xα, X,* ¯ **X***,* **Xd***, S, and S*1*,...,Sq be as previously defined, the sample sum of products matrix in the p* × *nq sample matrix* **X***, denoted by S, is given by*

$$S = S\_1 + \cdots + S\_q.\tag{4.6.5}$$

**Example 4.6.1.** Consider a 2×2 real matrix-variate *N*2*,*2*(O, A, B)* distribution with the parameters

$$A = \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix} \text{ and } B = \begin{bmatrix} 3 & -1 \\ -1 & 2 \end{bmatrix}.$$

Let *Xα, α* = 1*,...,* 5*,* be a simple random sample of size 5 from this real Gaussian population. Suppose that the following observations on *Xα, α* = 1*,...,* 5*,* were obtained:

$$\begin{aligned} X\_1 &= \begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix}, & X\_2 &= \begin{bmatrix} -1 & 1 \\ -2 & 1 \end{bmatrix}, & X\_3 &= \begin{bmatrix} 0 & 1 \\ 1 & 2 \end{bmatrix}, \\\ X\_4 &= \begin{bmatrix} -1 & 1 \\ 1 & 2 \end{bmatrix}, & X\_5 &= \begin{bmatrix} -4 & 1 \\ -1 & -2 \end{bmatrix}. \end{aligned}$$

Compute the sample matrix, the sample average, the sample deviation matrix and the sample sum of products matrix.

#### **Solution 4.6.1.** The sample average is available as

$$\begin{aligned} \bar{X} &= \frac{1}{5} [X\_1 + \dots + X\_5] \\ &= \frac{1}{5} \begin{bmatrix} 1 + (-1) + 0 + (-1) + (-4) & 1 + 1 + 1 + 1 + 1 \\ 1 + (-2) + 1 + 1 + (-1) & 2 + 1 + 2 + 2 + (-2) \end{bmatrix} = \begin{bmatrix} -1 & 1 \\ 0 & 1 \end{bmatrix}. \end{aligned}$$

The deviations are then

$$\begin{aligned} X\_{1d} &= X\_1 - \bar{X} = \begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix} - \begin{bmatrix} -1 & 1 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 0 \\ 1 & 1 \end{bmatrix}, \ X\_{2d} = \begin{bmatrix} 0 & 0 \\ -2 & 0 \end{bmatrix}, \\\ X\_{3d} &= \begin{bmatrix} 1 & 0 \\ 1 & 1 \end{bmatrix}, \ X\_{4d} = \begin{bmatrix} 0 & 0 \\ 1 & 1 \end{bmatrix}, \ X\_{5d} = \begin{bmatrix} -3 & 0 \\ -1 & -3 \end{bmatrix}. \end{aligned}$$

Thus, the sample matrix, the sample average matrix and the sample deviation matrix, denoted by bold-faced letters, are the following:

$$\mathbf{X} = [X\_1, X\_2, X\_3, X\_4, X\_5], \; \bar{\mathbf{X}} = [\bar{X}, \dots, \bar{X}] \text{ and } \mathbf{X\_d} = [X\_{1d}, X\_{2d}, X\_{3d}, X\_{4d}, X\_{5d}].$$

The sample sum of products matrix is then

$$S = [\mathbf{X} - \mathbf{\bar{X}}][\mathbf{X} - \mathbf{\bar{X}}]' = [\mathbf{X\_d}][\mathbf{X\_d}]' = S\_1 + S\_2$$

where *S*<sup>1</sup> is obtained from the first columns of each of *Xαd , α* = 1*,...,* 5*,* and *S*<sup>2</sup> is evaluated from the second columns of *Xαd , α* = 1*,...,* 5. That is,

*S*<sup>1</sup> = 2 1 [2 1] + 0 −2 [<sup>0</sup> <sup>−</sup> <sup>2</sup>] + 1 1 [1 1] + 0 1 [0 1] + −3 −1 [−3 − 1] = 4 2 2 1 + 0 0 0 4 + 1 1 1 1 + 0 0 0 1 + 9 3 3 1 = 14 6 6 8 ; *S*<sup>2</sup> = 0 1 [0 1] + 0 0 [0 0] + 0 1 [0 1] + 0 1 [0 1] + 0 −3 [0 − 3] = 0 0 0 1 + *O* + 0 0 0 1 + 0 0 0 1 + 0 0 0 9 = 0 0 0 12 ; *S* = *S*<sup>1</sup> + *S*<sup>2</sup> = 14 6 6 20 *.*

This *S* can be directly verified by taking [**X** − **X**¯ ][**X** − **X**¯ ] - = [**Xd**][**Xd**] -= where

$$\mathbf{X} - \bar{\mathbf{X}} = \mathbf{X\_d} = \begin{bmatrix} 2 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & -3 & 0\\ 1 & 1 & -2 & 0 & 1 & 1 & 1 & 1 & -1 & -3 \end{bmatrix}, \ S = \mathbf{X\_d} \mathbf{X\_d'}.$$

#### **4.6a. The General Complex Matrix-variate Case**

The preceding analysis has its counterpart for the complex case. Let *X*˜ *<sup>α</sup>* = *(x*˜*ijα)* be a *p* × *q* matrix in the complex domain with the *x*˜*ijα*'s being distinct complex scalar variables. Consider a simple random sample of size *n* from this population designated by *X*˜ 1. Let the *α*-th sample matrix be *X*˜ *α, α* = 1*, . . . , n,* the *X*˜ *<sup>α</sup>*'s being iid as *X*˜ <sup>1</sup>*,* and the *p* × *nq* sample matrix be denoted by the bold-faced **X**˜ = [*X*˜ <sup>1</sup>*,..., X*˜ *<sup>n</sup>*]. Let the sample average be denoted by ¯ *X*˜ = *(*¯ *x*˜*ij ) ,* ¯ *<sup>x</sup>*˜*ij* <sup>=</sup> <sup>1</sup> *n <sup>n</sup> <sup>α</sup>*=<sup>1</sup> *<sup>x</sup>*˜*ijα,* and **<sup>X</sup>**˜ **<sup>d</sup>** be the sample deviation matrix:

$$\tilde{\mathbf{X}}\_{\mathbf{d}} = [\tilde{X}\_{\mathbf{l}} - \tilde{X}, \dots, \tilde{X}\_{n} - \tilde{X}].$$

Let *S*˜ be the sample sum of products matrix, namely, *S*˜ = **X**˜ **dX**˜ <sup>∗</sup> **<sup>d</sup>** where an asterisk denotes the complex conjugate transpose and let *S*˜ *<sup>j</sup>* be the sample sum of products matrix corresponding to the *j* -th column of **X**˜ . Then we have the following result:

**Theorem 4.6a.3.** *Let* **X**˜ *,* ¯ *X,*˜ **X**˜ **<sup>d</sup>***, S*˜ *and S*˜ *<sup>j</sup> be as previously defined. Then,*

$$
\tilde{S} = \tilde{S}\_1 + \dots + \tilde{S}\_q = \tilde{\mathbf{X}}\_d \tilde{\mathbf{X}}\_d^\*. \tag{4.6a.1}
$$

**Example 4.6a.1.** Consider a 2 × 2 complex matrix-variate *N*˜2*,*2*(O, A, B)* distribution where

$$A = \begin{bmatrix} 2 & 1+i \\ 1-i & 3 \end{bmatrix} \text{ and } \mathcal{B} = \begin{bmatrix} 2 & i \\ -1 & 2 \end{bmatrix}.$$

A simple random sample of size 4 from this population is available, that is, *X*˜ *<sup>α</sup> iid* ∼ *N*˜2*,*2*(O, A, B), α* = 1*,* 2*,* 3*,* 4. The following are one set of observations on these sample values:

$$
\tilde{X}\_1 = \begin{bmatrix} 2 & i \\ -i & 1 \end{bmatrix}, \tilde{X}\_2 = \begin{bmatrix} 3 & -i \\ i & 1 \end{bmatrix}, \tilde{X}\_3 = \begin{bmatrix} 1 & 1 - i \\ 1 + i & 3 \end{bmatrix}, \tilde{X}\_4 = \begin{bmatrix} 2 & 3 + i \\ 3 - i & 7 \end{bmatrix}.
$$

Determine the observed sample average, the sample matrix, the sample deviation matrix and the sample sum of products matrix.

**Solution 4.6a.1.** The sample average is

$$\begin{aligned} \bar{X} &= \frac{1}{4} [\tilde{X}\_1 + \tilde{X}\_2 + \tilde{X}\_3 + \tilde{X}\_4] \\ &= \frac{1}{4} \left\{ \begin{bmatrix} 2 & i \\ -i & 1 \end{bmatrix} + \begin{bmatrix} 3 & -i \\ i & 1 \end{bmatrix} + \begin{bmatrix} 1 & 1 - i \\ 1 + i & 3 \end{bmatrix} + \begin{bmatrix} 2 & 3 + i \\ 3 - i & 7 \end{bmatrix} \right\} = \begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix} \end{aligned}$$

and the deviations are as follows:

$$\begin{aligned} \tilde{X}\_{1d} &= \tilde{X}\_1 - \bar{\tilde{X}} = \begin{bmatrix} 0 & -1+i \\ -1-i & -2 \end{bmatrix}, \quad \tilde{X}\_{2d} = \begin{bmatrix} 1 & -1-i \\ -1+i & -2 \end{bmatrix}, \\\ \tilde{X}\_{3d} &= \begin{bmatrix} -1 & -i \\ i & 0 \end{bmatrix}, \quad \tilde{X}\_{4d} = \begin{bmatrix} 0 & 2+i \\ 2-i & 4 \end{bmatrix}. \end{aligned}$$

The sample deviation matrix is then **X**˜ **<sup>d</sup>** = [*X*˜ <sup>1</sup>*<sup>d</sup> , X*˜ <sup>2</sup>*<sup>d</sup> , X*˜ <sup>3</sup>*<sup>d</sup> , X*˜ <sup>4</sup>*<sup>d</sup>* ]. If *Vα*<sup>1</sup> denotes the first column of *X*˜ *αd* , then with our usual notation, *S*˜ <sup>1</sup> <sup>=</sup> <sup>4</sup> *<sup>j</sup>*=<sup>1</sup> *Vα*1*<sup>V</sup>* <sup>∗</sup> *<sup>α</sup>*<sup>1</sup> and similarly, if *Vα*<sup>2</sup> is the second column of **X**˜ *<sup>α</sup>***d**, then *S*˜ <sup>2</sup> <sup>=</sup> <sup>4</sup> *<sup>α</sup>*=<sup>1</sup> *Vα*2*<sup>V</sup>* <sup>∗</sup> *<sup>α</sup>*<sup>2</sup> , the sample sum of products matrix being *S*˜ = *S*˜ <sup>1</sup> + *S*˜ 2. Let us evaluate these quantities:

$$\begin{split} \tilde{S}\_{1} &= \begin{bmatrix} 0 \\ -1-i \end{bmatrix} \begin{bmatrix} 0 \ -1+i \end{bmatrix} + \begin{bmatrix} 1 \\ -1+i \end{bmatrix} \begin{bmatrix} 1 \ -1-i \end{bmatrix} + \begin{bmatrix} -1 \\ i \end{bmatrix} \begin{bmatrix} -1 \ -i \end{bmatrix} + \begin{bmatrix} 0 \\ 2-i \end{bmatrix} \begin{bmatrix} 0 \ 2+i \end{bmatrix}, \\ &= \begin{bmatrix} 0 & 0 \\ 0 & 2 \end{bmatrix} + \begin{bmatrix} 1 & -1-i \\ -1+i & 2 \end{bmatrix} + \begin{bmatrix} 1 & i \\ -i & 1 \end{bmatrix} + \begin{bmatrix} 0 & 0 \\ 0 & 5 \end{bmatrix} = \begin{bmatrix} 2 & -1 \\ -1 & 10 \end{bmatrix}, \end{split}$$

$$\begin{split} \tilde{S}\_{2} &= \begin{bmatrix} -1+i \\ -2 \end{bmatrix} [-1-i-2] + \begin{bmatrix} -1-i \\ -2 \end{bmatrix} [-1+i-2] + \begin{bmatrix} -i \\ 0 \end{bmatrix} [i \ 0] + \begin{bmatrix} 2+i \\ 4 \end{bmatrix} [2-i, 4] \\ &= \begin{bmatrix} 2 & 2-2i \\ 2+2i & 4 \end{bmatrix} + \begin{bmatrix} 2 & 2+2i \\ 2-2i & 4 \end{bmatrix} + \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} + \begin{bmatrix} 5 & 8+4i \\ 8-4i & 16 \end{bmatrix} \\ &= \begin{bmatrix} 10 & 12+4i \\ 12-4i & 24 \end{bmatrix} .\end{split}$$

and then,

$$
\tilde{S} = \tilde{S}\_1 + \tilde{S}\_2 = \begin{bmatrix} 2 & -1 \\ -1 & 10 \end{bmatrix} + \begin{bmatrix} 10 & 12 + 4i \\ 12 - 4i & 24 \end{bmatrix} = \begin{bmatrix} 12 & 11 + 4i \\ 11 - 4i & 34 \end{bmatrix}.
$$

This can also be verified directly as *S*˜ = [**X**˜ **<sup>d</sup>**][**X**˜ **<sup>d</sup>**] ∗ where the deviation matrix is

$$
\tilde{\mathbf{X}}\_{\mathbf{d}} = \begin{bmatrix} 0 & -1+i & 1 & -1-i & -1 & -i & 0 & 2+i \\ -i-1 & -2 & -1+i & -2 & i & 0 & 2-i & 4 \end{bmatrix} \dots
$$

As expected,

$$[\tilde{\mathbf{X}}\_{\mathbf{d}}][\tilde{\mathbf{X}}\_{\mathbf{d}}]^{\*} = \begin{bmatrix} 12 & 11 + 4i \\ 11 - 4i & 34 \end{bmatrix}.$$

This completes the calculations.

#### **Exercises 4.6**

**4.6.1.** Let *A* be a 2 × 2 matrix whose first row is *(*1*,* 1*)* and *B* be 3 × 3 matrix whose first row is *(*1*,* −1*,* 1*)*. Select your own real numbers to complete the matrices *A* and *B* so that *A>O* and *B>O*. Then consider a 2×3 matrix *X* having a real matrix-variate Gaussian density with the location parameter *M* = *O* and the foregoing parameter matrices *A* and *B*. Let the first row of *X* be *X*<sup>1</sup> and its second row be *X*2. Determine the marginal densities of *X*<sup>1</sup> and *X*2, the conditional density of *X*<sup>1</sup> given *X*2, the conditional density of *X*<sup>2</sup> given *X*1, the conditional expectation of *X*<sup>1</sup> given *X*<sup>2</sup> = *(*1*,* 0*,* 1*)* and the conditional expectation of *X*<sup>2</sup> given *X*<sup>1</sup> = *(*1*,* 2*,* 3*)*.

**4.6.2.** Consider the matrix *X* utilized in Exercise 4.6.1. Let its first two columns be *Y*<sup>1</sup> and its last one be *Y*2. Then, obtain the marginal densities of *Y*<sup>1</sup> and *Y*2, and the conditional densities of *Y*<sup>1</sup> given *Y*<sup>2</sup> and *Y*<sup>2</sup> given *Y*1, and evaluate the conditional expectation of *Y*<sup>1</sup>

given *Y* - <sup>2</sup> = *(*1*,* −1*)* as well as the conditional expectation of *Y*<sup>2</sup> given *Y*<sup>1</sup> = 1 1 1 2 .

**4.6.3.** Let *A>O* and *B>O* be 2 × 2 and 3 × 3 matrices whose first rows are *(*1*,* 1 − *i)* and *(*2*, i,* 1 + *i)*, respectively. Select your own complex numbers to complete the matrices *A* = *A*<sup>∗</sup> *> O* and *B* = *B*<sup>∗</sup> *> O*. Now, consider a 2 × 3 matrix *X*˜ having a complex matrix-variate Gaussian density with the aforementioned matrices *A* and *B* as parameter matrices. Assume that the location parameter is a null matrix. Letting the row partitioning of *X*˜ , denoted by *X*˜ <sup>1</sup>*, X*˜ <sup>2</sup>*,* be as specified in Exercise 4.6.1, answer all the questions posed in that exercise.

**4.6.4.** Let *A, B* and *X*˜ be as given in Exercise 4.6.3. Consider the column partitioning specified in Exercise 4.6.2. Then answer all the questions posed in Exercise 4.6.2.

**4.6.5.** Repeat Exercise 4.6.4 with the non-null location parameter

$$
\tilde{M} = \begin{bmatrix} 2 & 1-i & i \\ 1+i & 2+i & -3i \end{bmatrix} \dots
$$

#### **4.7. The Singular Matrix-variate Gaussian Distribution**

Consider the moment generating function specified in (4.3.3) for the real case, namely,

$$M\_X(T) = M\_f(T) = \mathbf{e}^{\text{tr}(TM') + \frac{1}{2}\text{tr}(\Sigma\_1 T \Sigma\_2 T')}\tag{4.7.1}$$

where *<sup>Σ</sup>*<sup>1</sup> <sup>=</sup> *<sup>A</sup>*−<sup>1</sup> *> O* and *<sup>Σ</sup>*<sup>2</sup> <sup>=</sup> *<sup>B</sup>*−<sup>1</sup> *> O*. In the complex case, the moment generating function is of the form

$$\tilde{M}\_{\tilde{X}}(\tilde{T}) = \mathbf{e}^{\Re[\text{tr}(\tilde{T}\tilde{M}^\*)] + \frac{1}{4}\text{tr}(\Sigma\_1 \tilde{T} \Sigma\_2 \tilde{T}^\*)}. \tag{4.7a.1}$$

The properties of the singular matrix-variate Gaussian distribution can be studied by making use of (4.7.1) and (4.7*a*.1). Suppose that we restrict *Σ*<sup>1</sup> and *Σ*<sup>2</sup> to be positive semidefinite matrices, that is, *Σ*<sup>1</sup> ≥ *O* and *Σ*<sup>2</sup> ≥ *O*. In this case, one can also study many properties of the distributions represented by the mgf's given in (4.7.1) and (4.7*a*.1); however, the corresponding densities will not exist unless the matrices *Σ*<sup>1</sup> and *Σ*<sup>2</sup> are both strictly positive definite. The *p* × *q* real or complex matrix-variate density does not exist if at least one of *A* or *B* is singular. When either or both *Σ*<sup>1</sup> and *Σ*<sup>2</sup> are only positive semi-definite, the distributions corresponding to the mgf's specified by (4.7.1) and (4.7*a*.1) are respectively referred to as *real matrix-variate singular Gaussian and complex matrixvariate singular Gaussian.*

For instance, let

$$
\Sigma\_1 = \begin{bmatrix} 4 & 2 \\ 2 & 1 \end{bmatrix} \text{ and } \Sigma\_2 = \begin{bmatrix} 3 & -1 & 0 \\ -1 & 2 & 1 \\ 0 & 1 & 1 \end{bmatrix}
$$

in the mgf of a 2 × 3 real matrix-variate Gaussian distribution. Note that *Σ*<sup>1</sup> = *Σ*- <sup>1</sup> and *Σ*<sup>2</sup> = *Σ*- <sup>2</sup>. Since the leading minors of *Σ*<sup>1</sup> are |*(*4*)*| = 4 *>* 0 and |*Σ*1| = 0 and those of *Σ*<sup>2</sup> are |*(*3*)*| = 3 *>* 0*,* 3 −1 −1 2 <sup>=</sup> <sup>5</sup> *<sup>&</sup>gt;* 0 and <sup>|</sup>*Σ*2| = <sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup>*, Σ*<sup>1</sup> is positive semi-definite and *Σ*<sup>2</sup> is positive definite. Accordingly, the resulting Gaussian distribution does not possess a density. Fortunately, its distributional properties can nevertheless be investigated via its associated moment generating function.

#### **References**

A.M. Mathai (1997): *Jacobians of Matrix Transformations and Functions of Matrix Argument*, World Scientific Publishing, New York.

A.M. Mathai and Nicy Sebastian (2022): On distributions of covariance structures, *Communications in Statistics*, *Theory and Methods*, http://doi.org/10.1080/03610926.2022. 2045022.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 5 Matrix-Variate Gamma and Beta Distributions**

#### **5.1. Introduction**

The notations introduced in the preceding chapters will still be followed in this one. Lower-case letters such as *x, y* will be utilized to represent real scalar variables, whether mathematical or random. Capital letters such as *X, Y* will be used to denote vector/matrix random or mathematical variables. A tilde placed on top of a letter will indicate that the variables are in the complex domain. However, the tilde will be omitted in the case of constant matrices such as *A, B*. The determinant of a square matrix *A* will be denoted as |*A*| or det*(A)* and, in the complex domain, the absolute value or modulus of the determinant of *B* will be denoted as |det*(B)*|. Square matrices appearing in this chapter will be assumed to be of dimension *p* × *p* unless otherwise specified.

We will first define the real matrix-variate gamma function, gamma integral and gamma density, wherefrom their counterparts in the complex domain will be developed. A particular case of the real matrix-variate gamma density known as the Wishart density is widely utilized in multivariate statistical analysis. Actually, the formulation of this distribution in 1928 constituted a significant advance in the early days of the discipline. A real matrix-variate gamma function, denoted by *Γp(α)*, will be defined in terms of a matrixvariate integral over a real positive definite matrix *X>O*. This integral representation of *Γp(α)* will be explicitly evaluated with the help of the transformation of a real positive definite matrix in terms of a lower triangular matrix having positive diagonal elements in the form *X* = *T T* where *T* = *(tij )* is a lower triangular matrix with positive diagonal elements, that is, *tij* = 0*, i<j* and *tjj >* 0*, j* = 1*,...,p*. When the diagonal elements are positive, it can be shown that the transformation *X* = *T T* is unique. Its associated Jacobian is provided in Theorem 1.6.7. This result is now restated for ready reference: For a *p* × *p* real positive definite matrix *X* = *(xij )>O*,

Arak M. Mathai, Serge B. Provost, Hans J. Haubold

$$dX = TT' \Rightarrow \mathrm{d}X = 2^p \{ \prod\_{j=1}^p t\_{jj}^{p+1-j} \} \mathrm{d}T \tag{5.1.1}$$

where *T* = *(tij ), tij* = 0*, i<j* and *tjj >* 0*, j* = 1*,...,p*. Consider the following integral representation of *Γp(α)* where the integral is over a real positive definite matrix *X* and the integrand is a real-valued scalar function of *X*:

$$\Gamma\_p(\alpha) = \int\_{X > O} |X|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(X)} \mathbf{d}X. \tag{5.1.2}$$

Under the transformation in (5.1.1),

$$\begin{split} |X|^{\alpha - \frac{p+1}{2}} \mathrm{d}X &= \{ \prod\_{j=1}^{p} (t\_{jj}^{2})^{\alpha - \frac{p+1}{2}} \} 2^{p} \{ \prod\_{j=1}^{p} t\_{jj}^{p+1-j} \} \, \mathrm{d}T \\ &= 2^{p} \{ \prod\_{j=1}^{p} (t\_{jj}^{2})^{\alpha - \frac{j}{2}} \} \, \mathrm{d}T. \end{split}$$

Observe that tr*(X)* = tr*(T T* - *)* = the sum of the squares of all the elements in *T* , which is *<sup>p</sup> <sup>j</sup>*=<sup>1</sup> *<sup>t</sup>*<sup>2</sup> *jj* + *i>j t*<sup>2</sup> *ij* . By letting *<sup>t</sup>*<sup>2</sup> *jj* <sup>=</sup> *yj* <sup>⇒</sup> <sup>d</sup>*tjj* <sup>=</sup> <sup>1</sup> 2 *y* 1 <sup>2</sup>−1 *<sup>j</sup>* d*yj* , noting that *tjj >* 0, the integral over *tjj* gives

$$2\int\_0^\infty (t\_{jj}^2)^{\alpha - \frac{j}{2}} \mathbf{e}^{-t\_{jj}^2} \, \mathrm{d}t\_{jj} = \Gamma\left(\alpha - \frac{j-1}{2}\right), \ \Re\left(\alpha - \frac{j-1}{2}\right) > 0, \ j = 1, \ldots, p, 1$$

the final condition being *(α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . Thus, we have the gamma product *Γ (α)Γ (α* − 1 <sup>2</sup> *)*··· *Γ (α* <sup>−</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> *)*. Now for *i>j* , the integral over *tij* gives

$$\prod\_{i>j} \int\_{-\infty}^{\infty} \mathbf{e}^{-t\_{ij}^2} \mathbf{d}t\_{ij} = \prod\_{i>j} \sqrt{\pi} = \pi^{\frac{p(p-1)}{4}}.$$

Therefore

$$\begin{split} \Gamma\_{p}(\alpha) &= \pi^{\frac{p(p-1)}{4}} \Gamma(\alpha) \Gamma\left(\alpha - \frac{1}{2}\right) \cdots \Gamma\left(\alpha - \frac{p-1}{2}\right), \; \mathfrak{N}(\alpha) > \frac{p-1}{2}, \\ &= \int\_{X > O} |X|^{\alpha - \frac{p+1}{2}} \operatorname{e}^{-\operatorname{tr}(X)} \operatorname{dX}, \; \mathfrak{N}(\alpha) > \frac{p-1}{2}. \end{split} \tag{5.1.3}$$

For example,

$$\,\_2F\_2(\alpha) = \pi^{\frac{(2)(1)}{4}} \Gamma(\alpha) \Gamma\left(\alpha - \frac{1}{2}\right) = \pi^{\frac{1}{2}} \Gamma(\alpha) \Gamma\left(\alpha - \frac{1}{2}\right), \,\, \mathfrak{R}(\alpha) > \frac{1}{2}.$$

This *Γp(α)* is known by different names in the literature. The first author calls it *the real matrix-variate gamma function* because of its association with a real matrix-variate gamma integral.

#### **5.1a. The Complex Matrix-variate Gamma**

In the complex case, consider a *p*×*p* Hermitian positive definite matrix *X*˜ = *X*˜ <sup>∗</sup> *> O*, where *X*˜ <sup>∗</sup> denotes the conjugate transpose of *X*˜ . Let *T*˜ = *(t* ˜*ij )* be a lower triangular matrix with the diagonal elements being real and positive. In this case, it can be shown that the transformation *X*˜ = *T*˜ *T*˜ <sup>∗</sup> is one-to-one. Then, as stated in Theorem 1.6a.7, the Jacobian is

$$\mathrm{d}\tilde{X} = 2^p \{ \prod\_{j=1}^p t\_{jj}^{2(p-j)+1} \} \mathrm{d}\,\tilde{T}.\tag{5.1a.1}$$

With the help of (5.1a.1), we can evaluate the following integral over *p* × *p* Hermitian positive definite matrices where the integrand is a real-valued scalar function of *X*˜ . We will denote the integral by *Γ*˜ *p(α)*, that is,

$$\tilde{\Gamma}\_p(\alpha) = \int\_{\tilde{X} > O} |\det(\tilde{X})|^{\alpha - p} \mathbf{e}^{-\text{tr}(\tilde{X})} \mathbf{d}\tilde{X}. \tag{5.1a.2}$$

Let us evaluate the integral in (5.1a.2) by making use of (5.1a.1). Parallel to the real case, we have

$$\begin{aligned} |\det(\tilde{X})|^{\alpha-p} \mathrm{d}\tilde{X} &= \{ \prod\_{i=1}^p (t\_{jj}^2)^{\alpha-p} \} 2^p \{ \prod\_{j=1}^p t\_{jj}^{2(p-j)+1} \} \mathrm{d}\tilde{T} \\ &= \{ \prod\_{j=1}^p 2(t\_{jj}^2)^{\alpha-j+\frac{1}{2}} \} \mathrm{d}\tilde{T} .\end{aligned}$$

As well,

$$\mathbf{e}^{-\text{tr}(\tilde{X})} = \mathbf{e}^{-\sum\_{j=1}^{p} t\_{jj}^{2} - \sum\_{i>j} |\tilde{t}\_{ij}|^{2}}.$$

Since *tjj* is real and positive, the integral over *tjj* gives the following:

$$2\int\_0^\infty (t\_{jj}^2)^{\alpha - j + \frac{1}{2}} \mathbf{e}^{-l\_{jj}^2} \mathrm{d}t\_{jj} = \Gamma(\alpha - (j - 1)), \ \Re(\alpha - (j - 1)) > 0, \ j = 1, \dots, p, q$$

the final condition being *(α) > p* − 1. Note that the absolute value of *t* ˜*ij* , namely, |*t* ˜*ij* | is such that |*t* ˜*ij* | <sup>2</sup> <sup>=</sup> *<sup>t</sup>*<sup>2</sup> *ij*<sup>1</sup> + *t* 2 *ij*<sup>2</sup> where *t* ˜*ij* <sup>=</sup> *tij*<sup>1</sup> <sup>+</sup> *itij*<sup>2</sup> with *tij*1*, tij*<sup>2</sup> real and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. Thus,

$$\prod\_{i>j} \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} \mathbf{e}^{-(t\_{ij1}^2 + t\_{ij2}^2)} \mathbf{d}t\_{ij1} \wedge \ \mathbf{d}t\_{ij2} = \prod\_{i>j} \pi = \pi^{\frac{p(p-1)}{2}}.$$

Then

$$\begin{split} \tilde{\Gamma}\_p(\boldsymbol{\alpha}) &= \int\_{\tilde{X} > O} |\det(\tilde{X})|^{\boldsymbol{\alpha}-p} \mathbf{e}^{-\operatorname{tr}(\tilde{X})} \mathrm{d}\tilde{X}, \ \mathfrak{R}(\boldsymbol{\alpha}) > \frac{p-1}{2}, \\ &= \pi^{\frac{p(p-1)}{2}} \Gamma(\boldsymbol{\alpha}) \Gamma(\boldsymbol{\alpha}-1) \cdots \Gamma(\boldsymbol{\alpha}-(p-1)), \ \mathfrak{R}(\boldsymbol{\alpha}) > p-1. \end{split} \tag{5.1a.3}$$

We will refer to *Γ*˜ *p(α)* as the complex matrix-variate gamma because of its association with a complex matrix-variate gamma integral. As an example, consider

$$
\tilde{\Gamma}\_2(\alpha) = \pi^{\frac{(2)(l)}{2}} \Gamma(\alpha) \Gamma(\alpha - 1) = \pi \Gamma(\alpha) \Gamma(\alpha - 1), \; \Re(\alpha) > 1.
$$

#### **5.2. The Real Matrix-variate Gamma Density**

In view of (5.1.3), we can define a real matrix-variate gamma density with shape parameter *α* as follows, where *X* is *p* × *p* real positive definite matrix:

$$f\_1(X) = \begin{cases} \frac{1}{\Gamma\_p(\alpha)} |X|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(X)}, & X > O, \ \Re(\alpha) > \frac{p-1}{2} \\ 0, \text{ elsewhere.} & \end{cases} \tag{5.2.1}$$

**Example 5.2.1.** Let

$$X = \begin{bmatrix} \mathbf{x}\_{11} & \mathbf{x}\_{12} \\ \mathbf{x}\_{12} & \mathbf{x}\_{22} \end{bmatrix}, \ \tilde{X} = \begin{bmatrix} \mathbf{x}\_{1} & \mathbf{x}\_{2} + i\mathbf{y}\_{2} \\ \mathbf{x}\_{2} - i\mathbf{y}\_{2} & \mathbf{x}\_{3} \end{bmatrix}, \ X = X' > O, \ \tilde{X} = \tilde{X}^\* > O,$$

where *<sup>x</sup>*11*, x*12*, x*22*, x*1*, x*2*, y*2*, x*<sup>3</sup> are all real scalar variables, *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*, *<sup>x</sup>*<sup>22</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*11*x*<sup>22</sup> <sup>−</sup>*x*<sup>2</sup> <sup>12</sup> *>* 0. While these are the conditions for the positive definiteness of the real matrix *<sup>X</sup>*, *<sup>x</sup>*<sup>1</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*<sup>3</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*1*x*<sup>3</sup> <sup>−</sup> *(x*<sup>2</sup> <sup>2</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>2</sup> *) >* 0 are the conditions for the Hermitian positive definiteness of *X*˜ . Let us evaluate the following integrals, subject to the previously specified conditions on the elements of the matrix:

$$\begin{aligned} (1) &: \quad \delta\_{1} = \int\_{X>O} \mathbf{e}^{-(\mathbf{x}\_{11}+\mathbf{x}\_{22})} \mathrm{d}\mathbf{x}\_{11} \wedge \mathbf{d}\mathbf{x}\_{12} \wedge \mathbf{d}\mathbf{x}\_{22} \\ (2) &: \quad \delta\_{2} = \int\_{\tilde{X}>O} \mathbf{e}^{-(\mathbf{x}\_{1}+\mathbf{x}\_{3})} \mathrm{d}\mathbf{x}\_{1} \wedge \mathbf{d}(\mathbf{x}\_{2} + i\mathbf{y}\_{2}) \wedge \mathbf{d}\mathbf{x}\_{3} \\ (3) &: \quad \delta\_{3} = \int\_{X>O} |X| \mathbf{e}^{-(\mathbf{x}\_{11}+\mathbf{x}\_{22})} \mathrm{d}\mathbf{x}\_{11} \wedge \mathbf{d}\mathbf{x}\_{12} \wedge \mathbf{d}\mathbf{x}\_{22} \\ (4) &: \quad \delta\_{4} = \int\_{\tilde{X}>O} |\det(\tilde{X})|^{2} \mathbf{e}^{-(\mathbf{x}\_{1}+\mathbf{x}\_{3})} \mathrm{d}\mathbf{x}\_{1} \wedge \mathbf{d}(\mathbf{x}\_{2} + i\mathbf{y}\_{2}) \wedge \mathbf{d}\mathbf{x}\_{3}. \end{aligned}$$

**Solution 5.2.1.** (1): Observe that *δ*<sup>1</sup> can be evaluated by treating the integral as a real matrix-variate integral, namely,

$$\delta\_1 = \int\_{X > O} |X|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(X)} \mathbf{dX} \text{ with } p = 2, \ \frac{p+1}{2} = \frac{3}{2}, \ \alpha = \frac{3}{2}, \ \alpha$$

and hence the integral is

$$
\Gamma\_2(3/2) = \pi^{\frac{2(1)}{4}} \Gamma(3/2) \Gamma(1) = \pi^{1/2} (1/2) \Gamma(1/2) = \frac{\pi}{2}.
$$

This result can also be obtained by direct integration as a multiple integral. In this case, the integration has to be done under the conditions *<sup>x</sup>*<sup>11</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*<sup>22</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*11*x*<sup>22</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> <sup>12</sup> *>* 0, that is, *x*<sup>2</sup> <sup>12</sup> *< x*11*x*<sup>22</sup> or −√*<sup>x</sup>*11*x*<sup>22</sup> *< x*<sup>12</sup> *<sup>&</sup>lt;* <sup>√</sup>*x*11*x*22. The integral over *<sup>x</sup>*<sup>12</sup> yields # <sup>√</sup>*x*11*x*<sup>22</sup> −√*<sup>x</sup>*11*x*<sup>22</sup> d*x*<sup>12</sup> = 2 <sup>√</sup>*x*11*x*22, that over *<sup>x</sup>*<sup>11</sup> then gives

$$2\int\_{\mathbf{x}\_{\mathrm{II}}=\mathbf{0}}^{\infty} \sqrt{\mathbf{x}\_{\mathrm{II}}} \mathbf{e}^{-\mathbf{x}\_{\mathrm{II}}} \mathrm{d}\mathbf{x}\_{\mathrm{II}} = 2\int\_{0}^{\infty} \mathbf{x}\_{\mathrm{II}}^{\frac{3}{2}-1} \mathbf{e}^{-\mathbf{x}\_{\mathrm{II}}} \,\mathrm{d}\mathbf{x}\_{\mathrm{II}} = 2\varGamma(3/2) = \pi^{\frac{1}{2}},$$

and on integrating with respect to *x*22, we have

$$\int\_0^\infty \sqrt{\chi\_{22}} \mathbf{e}^{-\varkappa 22} \mathrm{d}x\_{22} = \frac{1}{2} \pi^{\frac{1}{2}},$$

so that *<sup>δ</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> 2 <sup>√</sup>*<sup>π</sup>* <sup>√</sup>*<sup>π</sup>* <sup>=</sup> *<sup>π</sup>* 2 .

(2): On observing that *δ*<sup>2</sup> can be viewed as a complex matrix-variate integral, it is seen that

$$\delta\_2 = \int\_{\tilde{X} > O} |\det(\tilde{X})|^{2 - 2} \mathbf{e}^{-\text{tr}(\tilde{X})} \, d\tilde{X} = \tilde{\Gamma}\_2(2) = \pi^{\frac{2(1)}{2}} \Gamma(2) \Gamma(1) = \pi.$$

This answer can also be obtained by evaluating the multiple integral. Since *X>O* ˜ , we have *<sup>x</sup>*<sup>1</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*<sup>3</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*1*x*<sup>3</sup> <sup>−</sup> *(x*<sup>2</sup> <sup>2</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>2</sup> *) >* 0, that is, *<sup>x</sup>*<sup>1</sup> *<sup>&</sup>gt; (x*<sup>2</sup> 2+*y*<sup>2</sup> 2 *) <sup>x</sup>*<sup>3</sup> . Integrating first with respect to *<sup>x</sup>*<sup>1</sup> and letting *<sup>y</sup>* <sup>=</sup> *<sup>x</sup>*<sup>1</sup> <sup>−</sup> *(x*<sup>2</sup> 2+*y*<sup>2</sup> 2 *) <sup>x</sup>*<sup>3</sup> , we have

$$\int\_{x\_1 > \frac{(x\_2^2 + y\_2^2)}{x\_3}} \mathbf{e}^{-x\_1} \mathbf{d} \, x\_1 = \int\_{y=0}^{\infty} \mathbf{e}^{-y - \frac{(x\_2^2 + y\_2^2)}{x\_3}} \mathbf{d}y = \left. \mathbf{e}^{-\frac{(x\_2^2 + y\_2^2)}{x\_3}} \right|.$$

Now, the integrals over *x*<sup>2</sup> and *y*<sup>2</sup> give

$$\int\_{-\infty}^{\infty} \mathbf{e}^{-\frac{\mathbf{y}^{2}}{x\_{3}}} \mathrm{d}x\_{2} = \sqrt{\mathbf{x}\_{3}} \int\_{-\infty}^{\infty} \mathbf{e}^{-\mu^{2}} \mathrm{d}u = \sqrt{\mathbf{x}\_{3}} \sqrt{\pi} \quad \text{and} \quad \int\_{-\infty}^{\infty} \mathbf{e}^{-\frac{\mathbf{y}\_{2}^{2}}{x\_{3}}} \mathrm{d}y\_{2} = \sqrt{\mathbf{x}\_{3}} \sqrt{\pi},$$

that with respect to *x*<sup>3</sup> then yielding

$$\int\_0^\infty x\_3 \mathbf{e}^{-x\_3} \mathrm{d}x\_3 = \varGamma(2) = 1,$$

so that *δ*<sup>2</sup> = *(*1*)* <sup>√</sup>*<sup>π</sup>* <sup>√</sup>*<sup>π</sup>* <sup>=</sup> *π.*

(3): Observe that *δ*<sup>3</sup> can be evaluated as a real matrix-variate integral. Then

$$\begin{split} \delta\_3 &= \int\_{X>O} |X| \mathbf{e}^{-\text{tr}(X)} \mathrm{d}X = \int\_{X>O} |X|^{\frac{5}{2} - \frac{3}{2}} \mathbf{e}^{-\text{tr}(X)} \mathrm{d}X, \text{ with } \frac{p+1}{2} = \frac{3}{2} \text{ as } p = 2, \\ &= \Gamma\_2(5/2) = \pi^{\frac{2(1)}{4}} \Gamma(5/2)\Gamma(4/2) = \pi^{\frac{1}{2}}(3/2)(1/2)\pi^{1/2}(1) \\ &= \frac{3}{4}\pi. \end{split}$$

Let us proceed by direct integration:

$$\begin{split} \delta\_{3} &= \int\_{X>O} [\mathbf{x}\_{11}\mathbf{x}\_{22} - \mathbf{x}\_{12}^{2}] \mathbf{e}^{-(\mathbf{x}\_{11} + \mathbf{x}\_{22})} \mathrm{d}\mathbf{x}\_{11} \wedge \mathrm{d}\mathbf{x}\_{12} \wedge \mathrm{d}\mathbf{x}\_{22} \\ &= \int\_{X>O} \mathbf{x}\_{22} \bigg[ \mathbf{x}\_{11} - \frac{\mathbf{x}\_{12}^{2}}{\mathbf{x}\_{22}} \bigg] \mathbf{e}^{-(\mathbf{x}\_{11} + \mathbf{x}\_{22})} \mathrm{d}\mathbf{x}\_{11} \wedge \mathrm{d}\mathbf{x}\_{12} \wedge \mathrm{d}\mathbf{x}\_{22}; \end{split}$$

letting *<sup>y</sup>* <sup>=</sup> *<sup>x</sup>*<sup>11</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> 12 *<sup>x</sup>*<sup>22</sup> , the integral over *x*<sup>11</sup> yields

$$\int\_{\mathbf{x}\_{11} > \frac{\mathbf{x}\_{12}^2}{\mathbf{x}\_{22}}} \left[ \mathbf{x}\_{11} - \frac{\mathbf{x}\_{12}^2}{\mathbf{x}\_{22}} \right] \mathbf{e}^{-\mathbf{x}\_{11}} \mathbf{d} \mathbf{x}\_{11} = \int\_{\mathbf{y} = 0}^{\infty} \mathbf{y} \, \mathbf{e}^{-\mathbf{y} - \frac{\mathbf{x}\_{12}^2}{\mathbf{x}\_{22}}} \mathbf{d} \mathbf{y} = \mathbf{e}^{-\frac{\mathbf{x}\_{12}^2}{\mathbf{x}\_{22}}}.$$

Now, the integral over *<sup>x</sup>*<sup>12</sup> gives <sup>√</sup>*x*22√*<sup>π</sup>* and finally, that over *<sup>x</sup>*<sup>22</sup> yields

$$\int\_{\mathbf{x}\_{22}>0} \mathbf{x}\_{22}^{\frac{3}{2}} \mathbf{e}^{-\mathbf{x}\mathbf{z}} \mathbf{d} \mathbf{x}\_{22} = \Gamma(\mathfrak{S}/2) = (\mathfrak{J}/2)(1/2)\sqrt{\pi} = \frac{3}{4}\pi.$$

(4): Noting that we can treat *δ*<sup>4</sup> as a complex matrix-variate integral, we have

$$\begin{split} \delta\_{4} &= \int\_{\tilde{X} > O} |\det(\tilde{X})|^{2} \mathbf{e}^{-\text{tr}(\tilde{X})} \mathrm{d}\tilde{X} = \int\_{\tilde{X} > O} |\det(\tilde{X})|^{4-2} \mathbf{e}^{-\text{tr}(\tilde{X})} \mathrm{d}\tilde{X} = \tilde{\Gamma}\_{2}(4), \; a = 4, \; p = 2, \\ &= \pi^{\frac{2(1)}{2}} \Gamma(4) \Gamma(3) = \pi(3!)(2!) = 12\pi. \end{split}$$

Direct evaluation will be challenging in this case as the integrand involves |det*(X)*˜ | 2.

If a scale parameter matrix *B>O* is to be introduced in (5.2.1), then consider tr*(BX)* = tr*(B* 1 <sup>2</sup>*XB* 1 <sup>2</sup> *)* where *B* 1 <sup>2</sup> is the positive definite square root of the real positive definite constant matrix *B*. On applying the transformation *Y* = *B* 1 <sup>2</sup>*XB* 1 <sup>2</sup> ⇒ d*X* = |*B*| <sup>−</sup>*(p*+1*)* <sup>2</sup> d*Y* , as stated in Theorem 1.6.5, we have

$$\begin{split} \int\_{X>O} |X|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\operatorname{tr}(BX)} \mathbf{d}X &= \int\_{X>O} |X|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\operatorname{tr}(B^{\frac{1}{2}}XB^{\frac{1}{2}})} \mathbf{d}X \\ &= |B|^{-\alpha} \int\_{Y>O} |Y|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\operatorname{tr}(Y)} \mathbf{d}Y \\ &= |B|^{-\alpha} \Gamma\_p(\alpha). \end{split} \tag{5.2.2}$$

This equality brings about two results. First, the following identity which will turn out to be very handy in many of the computations:

$$|B|^{-\alpha} \equiv \frac{1}{\Gamma\_p(\alpha)} \int\_{X > O} |X|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(BX)} \mathbf{dX}, \ B > O, \ \Re(\alpha) > \frac{p-1}{2}. \tag{5.2.3}$$

As well, the following two-parameter real matrix-variate gamma density with shape parameter *α* and scale parameter matrix *B>O* can be constructed from (5.2.2):

$$f(X) = \begin{cases} \frac{|B|^a}{\Gamma\_p(a)} |X|^{a - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(BX)}, & X > O, \ B > O, \ \Re(a) > \frac{p-1}{2} \\ 0, \text{ elsewhere.} \end{cases} \tag{5.2.4}$$

#### **5.2.1. The mgf of the real matrix-variate gamma distribution**

Let us determine the mgf associated with the density given in (5.2.4), that is, the twoparameter real matrix-variate gamma density. Observing that *X* = *X*- , let *T* be a symmetric *p* × *p* real positive definite parameter matrix. Then, noting that

$$\text{tr}(TX) = \sum\_{j=1}^{p} t\_{jj} \mathbf{x}\_{jj} + 2 \sum\_{i>j} t\_{ij} \mathbf{x}\_{ij},\tag{i}$$

it is seen that the non-diagonal elements in *X* multiplied by the corresponding parameters will have twice the weight of the diagonal elements multiplied by the corresponding parameters. For instance, consider the 2 × 2 case:

$$\begin{aligned} \text{tr}\left\{ \begin{bmatrix} t\_{11} & t\_{12} \\ t\_{12} & t\_{22} \end{bmatrix} \begin{bmatrix} \mathbf{x}\_{11} & \mathbf{x}\_{12} \\ \mathbf{x}\_{12} & \mathbf{x}\_{22} \end{bmatrix} \right\} &= \text{tr}\begin{bmatrix} t\_{11}\mathbf{x}\_{11} + t\_{12}\mathbf{x}\_{12} & \alpha\_{1} \\ \alpha\_{2} & t\_{12}\mathbf{x}\_{12} + t\_{22}\mathbf{x}\_{22} \end{bmatrix} \\ &= t\_{11}\mathbf{x}\_{11} + 2t\_{12}\mathbf{x}\_{12} + t\_{22}\mathbf{x}\_{22} \end{aligned} \tag{ii}$$

where *α*<sup>1</sup> and *α*<sup>2</sup> represent elements that are not involved in the evaluation of the trace. Note that due to the symmetry of *T* and *X*, *t*<sup>21</sup> = *t*<sup>12</sup> and *x*<sup>21</sup> = *x*12, so that the cross product term *t*12*x*<sup>12</sup> in *(ii)* appears twice whereas each of the terms *t*11*x*<sup>11</sup> and *t*22*x*<sup>22</sup> appear only once.

However, in order to be consistent with the mgf in a real multivariate case, each variable need only be multiplied once by the corresponding parameter, the mgf being then obtained by taking the expected value of the resulting exponential sum. Accordingly, the parameter matrix has to be modified as follows: let <sup>∗</sup>*T* = *(*∗*tij )* where <sup>∗</sup>*tjj* <sup>=</sup> *tjj ,* <sup>∗</sup>*tij* <sup>=</sup> <sup>1</sup> <sup>2</sup> *tij , i* = *j,* and *tij* = *tj i* for all *i* and *j* or, in other words, the non-diagonal elements of the symmetric matrix *T* are weighted by <sup>1</sup> <sup>2</sup> , such a matrix being denoted as <sup>∗</sup>*T* . Then,

$$\text{tr}(\_\*TX) = \sum\_{i,j} t\_{ij} x\_{ij},$$

and the mgf in the real matrix-variate two-parameter gamma density, denoted by *MX(*∗*T )*, is the following:

$$\begin{aligned} M\_X(\*T) &= E[\mathbf{e}^{\mathrm{tr}(\_\*TX)}] \\ &= \frac{|B|^\alpha}{\Gamma\_p(\alpha)} \int\_{X > O} |X|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{\mathrm{tr}(\_\*TX - BX)} \mathrm{d}X. \end{aligned}$$

Now, since

$$\operatorname{tr}(BX - \,\_\*TX) = \operatorname{tr}((B - \,\_\*T)^{\frac{1}{2}}X(B - \,\_\*T)^{\frac{1}{2}})$$

for *(B* − <sup>∗</sup>*T ) > O,* that is, *(B* − <sup>∗</sup>*T )* 1 <sup>2</sup> *> O,* which means that *Y* = *(B* − <sup>∗</sup>*T )* 1 <sup>2</sup>*X(B* − <sup>∗</sup>*T )* 1 <sup>2</sup> ⇒ d*X* = |*B* − <sup>∗</sup>*T )*| <sup>−</sup>*( <sup>p</sup>*+<sup>1</sup> <sup>2</sup> *)* d*Y* , we have

$$\begin{split} M\_{X}(\ast\_{\*}T) &= \frac{|B|^{\alpha}}{\Gamma\_{p}(\alpha)} \int\_{X>O} |X|^{\alpha-\frac{p+1}{2}} \mathbf{e}^{-\text{tr}((B-\ast\_{\*}T)X)} \mathbf{d}X \\ &= \frac{|B|^{\alpha}}{\Gamma\_{p}(\alpha)} |B-\ast\_{\*}T|^{-\alpha} \int\_{Y>O} |Y|^{\alpha-\frac{p+1}{2}} \mathbf{e}^{-\text{tr}(Y)} \mathbf{d}Y \\ &= |B|^{\alpha} |B-\ast\_{\*}T|^{-\alpha} \\ &= |I-B^{-1}\ast T|^{-\alpha} \quad \text{for } I-B^{-1}\ast\_{\*}T>O. \end{split} \tag{5.2.5}$$

When <sup>∗</sup>*T* is replaced by −∗*T* , (5.2.5) gives the Laplace transform of the two-parameter gamma density in the real matrix-variate case as specified by (5.2.4), which is denoted by *Lf (*∗*T )*, that is,

$$L\_f(\ast\_\* T) = M\_X(-\ast\_\* T) = |I + B^{-1} \, \_\ast T|^{-\alpha} \text{ for } I + B^{-1} \, \_\ast T > O. \tag{5.2.6}$$

For example, if

$$\begin{aligned} \mathbf{x} = \begin{bmatrix} x\_{11} & x\_{12} \\ x\_{12} & x\_{22} \end{bmatrix}, \ \mathbf{B} = \begin{bmatrix} 2 & -1 \\ -1 & 3 \end{bmatrix} \text{ and } \ \_\*T = \begin{bmatrix} \*^{t\_{11}}\_{\*11} & \*^{t\_{12}}\_{\*22} \\ \*^{t\_{12}}\_{\*12} & \*^{t\_{22}}\_{\*22} \end{bmatrix}, \end{aligned}$$

then |*B*| = 5 and

$$\begin{split} M\_X(\ast\_\* T) &= |B|^\alpha |B - \,\_\ast T|^{-\alpha} = \mathfrak{S}^\alpha \begin{vmatrix} 2 - \,\_\ast t\_{11} & -1 - \,\_\ast t\_{12} \\ -1 - \,\_\ast t\_{12} & 3 - \,\_\ast t\_{22} \end{vmatrix}^{-\alpha} \\ &= \mathfrak{S}^\alpha \{ (2 - \,\_\ast t\_{11})(3 - \,\_\ast t\_{22}) - (1 + \,\_\ast t\_{12})^2 \}^{-\alpha} .\end{split}$$

If <sup>∗</sup>*T* is partitioned into sub-matrices and *X* is partitioned accordingly as

$$\begin{aligned} \, \_\*T = \begin{bmatrix} \* \, \_\*T\_{11} & \* \, \_\*T\_{12} \\ \* \, \_\*T\_{21} & \* \, \_\*T\_{22} \end{bmatrix} \quad \text{and} \quad X = \begin{bmatrix} X\_{11} & X\_{12} \\ X\_{21} & X\_{22} \end{bmatrix} \end{aligned} \tag{iii}$$

where <sup>∗</sup>*T* <sup>11</sup> and *X*<sup>11</sup> are *r* × *r, r* ≤ *p,* then what can be said about the densities of the diagonal blocks *X*<sup>11</sup> and *X*22? The mgf of *X*<sup>11</sup> is available from the definition by letting <sup>∗</sup>*<sup>T</sup>* <sup>12</sup> <sup>=</sup> *O,* <sup>∗</sup>*<sup>T</sup>* <sup>21</sup> <sup>=</sup> *<sup>O</sup>* and <sup>∗</sup>*<sup>T</sup>* <sup>22</sup> <sup>=</sup> *<sup>O</sup>*, as then *<sup>E</sup>*[etr*(*∗*T X)*] = *<sup>E</sup>*[ <sup>e</sup>tr*(*∗*<sup>T</sup>* <sup>11</sup>*X*11*)* ]*.* However, *B*−<sup>1</sup> <sup>∗</sup>*<sup>T</sup>* is not positive definite since *<sup>B</sup>*−<sup>1</sup> <sup>∗</sup>*<sup>T</sup>* is not symmetric, and thereby *<sup>I</sup>* <sup>−</sup> *<sup>B</sup>*−<sup>1</sup> ∗*T* cannot be positive definite when <sup>∗</sup>*T* <sup>12</sup> = *O,* <sup>∗</sup>*T* <sup>21</sup> = *O,* <sup>∗</sup>*T* <sup>22</sup> = *O*. Consequently, the mgf of *X*<sup>11</sup> cannot be determined from (5.2.6). As an alternative, we could rewrite (5.2.6) in the symmetric format and then try to evaluate the density of *X*11. As it turns out, the densities of *X*<sup>11</sup> and *X*<sup>22</sup> can be readily obtained from the mgf in two situations: either when *B* = *I* or *B* is a block diagonal matrix, that is,

$$B = \begin{bmatrix} B\_{11} & O \\ O & B\_{22} \end{bmatrix} \Rightarrow B^{-1} = \begin{bmatrix} B\_{11}^{-1} & O \\ O & B\_{22}^{-1} \end{bmatrix} . \tag{i\nu}$$

Hence we have the following results:

**Theorem 5.2.1.** *Let the p* × *p matrices X>O and* <sup>∗</sup>*T >O be partitioned as in (iii). Let X have a p* × *p real matrix-variate gamma density with shape parameter α and scale parameter matrix Ip. Then X*<sup>11</sup> *has an r* × *r real matrix-variate gamma density and X*<sup>22</sup> *has a (p* − *r)* × *(p* − *r) real matrix-variate gamma density with shape parameter α and scale parameters Ir and Ip*−*r, respectively.*

**Theorem 5.2.2.** *Let X be partitioned as in (iii). Let the p* × *p real positive definite parameter matrix B>O be partitioned as in (iv). Then X*<sup>11</sup> *has an r* × *r real matrixvariate gamma density with the parameters (α and B*<sup>11</sup> *> O) and X*<sup>22</sup> *has a (p* − *r)* × *(p* − *r) real matrix-variate gamma density with the parameters (α and B*<sup>22</sup> *> O).*

**Theorem 5.2.3.** *Let X be partitioned as in (iii). Then X*<sup>11</sup> *and X*<sup>22</sup> *are statistically independently distributed under the restrictions specified in Theorems 5.2.1 and 5.2.2.*

In the general case of *B*, write the mgf as *MX(*∗*T )* = |*B*| *<sup>α</sup>*|*<sup>B</sup>* <sup>−</sup> <sup>∗</sup>*<sup>T</sup>* <sup>|</sup> −*α*, which corresponds to a symmetric format. Then, when <sup>∗</sup>*T* = <sup>∗</sup>*T* <sup>11</sup> *O O O* ,

$$\begin{split} M\_{X}(\ast\_{\*}T) &= |B|^{\alpha} \left| \begin{array}{cc} B\_{11} - \, \ast\_{\*}T\_{11} & B\_{12} \\ B\_{21} & B\_{22} \end{array} \right|^{-\alpha} \\ &= |B|^{\alpha} |B\_{22}|^{-\alpha} |B\_{11} - \, \ast\_{\*}T\_{11} - B\_{12}B\_{22}^{-1}B\_{21}|^{-\alpha} \\ &= |B\_{22}|^{\alpha} |B\_{11} - B\_{12}B\_{22}^{-1}B\_{21}|^{\alpha} |B\_{22}|^{-\alpha} |(B\_{11} - B\_{12}B\_{22}^{-1}B\_{21}) - \, \ast\_{\*}T\_{11}|^{-\alpha} \\ &= |B\_{11} - B\_{12}B\_{22}^{-1}B\_{21}|^{\alpha} |(B\_{11} - B\_{12}B\_{22}^{-1}B\_{21}) - \, \ast\_{\*}T\_{11}|^{-\alpha}, \end{split}$$

which is obtained by making use of the representations of the determinant of a partitioned matrix, which are available from Sect. 1.3. Now, on comparing the last line with the first one, it is seen that *X*<sup>11</sup> has a real matrix-variate gamma distribution with shape parameter *<sup>α</sup>* and scale parameter matrix *<sup>B</sup>*<sup>11</sup> <sup>−</sup> *<sup>B</sup>*12*B*−<sup>1</sup> <sup>22</sup> *B*21. Hence, the following result:

**Theorem 5.2.4.** *If the p*×*p real positive definite matrix has a real matrix-variate gamma density with the shape parameter α and scale parameter matrix B and if X and B are partitioned as in (iii), then X*<sup>11</sup> *has a real matrix-variate gamma density with shape parameter <sup>α</sup> and scale parameter matrix <sup>B</sup>*<sup>11</sup> <sup>−</sup> *<sup>B</sup>*12*B*−<sup>1</sup> <sup>22</sup> *B*21*, and the sub-matrix X*<sup>22</sup> *has a real matrix-variate gamma density with shape parameter α and scale parameter matrix <sup>B</sup>*<sup>22</sup> <sup>−</sup> *<sup>B</sup>*21*B*−<sup>1</sup> <sup>11</sup> *B*12*.*

#### **5.2a. The Matrix-variate Gamma Function and Density, Complex Case**

Let *X*˜ = *X*˜ <sup>∗</sup> *> O* be a *p* × *p* Hermitian positive definite matrix. When *X*˜ is Hermitian, all its diagonal elements are real and hence tr*(X)*˜ is real. Let det*(X)*˜ denote the determinant and |det*(X)*˜ | denote the absolute value of the determinant of *X*˜ . As a result, |det*(X)*˜ | *<sup>α</sup>*−*<sup>p</sup>* e−tr*(X)*˜ is a real-valued scalar function of *X*˜ . Let us consider the following integral, denoted by *Γ*˜ *p(α)*:

$$\tilde{\Gamma}\_p(\alpha) = \int\_{\tilde{X} > O} |\det(\tilde{X})|^{\alpha - p} \operatorname{e}^{-\operatorname{tr}(\tilde{X})} \mathrm{d}\tilde{X},\tag{5.2a.1}$$

which was evaluated in Sect. 5.1a. In fact, (5.1a.3) provides two representations of the complex matrix-variate gamma function *Γ*˜ *p(α)*. With the help of (5.1a.3), we can define the complex *p* × *p* matrix-variate gamma density as follows:

Matrix-Variate Gamma and Beta Distributions

$$\tilde{f}\_1(\tilde{X}) = \begin{cases} \frac{1}{\tilde{I}\_p(a)} |\text{det}(\tilde{X})|^{a-p} e^{-\text{tr}(\tilde{X})}, & \tilde{X} > O, \ \Re(a) > p-1\\ 0, & \text{elsewhere}. \end{cases} \tag{5.2a.2}$$

For example, let us examine the 2 × 2 complex matrix-variate case. Let *X*˜ be a matrix in the complex domain, ¯ *X*˜ denoting its complex conjugate and *X*˜ <sup>∗</sup>*,* its conjugate transpose. When *X*˜ = *X*˜ <sup>∗</sup>*,* the matrix is Hermitian and its diagonal elements are real. In the 2 × 2 Hermitian case, let

$$
\tilde{X} = \begin{bmatrix} x\_1 & x\_2 + iy\_2 \\ x\_2 - iy\_2 & x\_3 \end{bmatrix} \Rightarrow \tilde{\tilde{X}} = \begin{bmatrix} x\_1 & x\_2 - iy\_2 \\ x\_2 + iy\_2 & x\_3 \end{bmatrix} \Rightarrow \tilde{X}^\* = \tilde{X} \dots
$$

Then, the determinants are

$$\begin{aligned} \det(\tilde{X}) &= \mathbf{x}\_1 \mathbf{x}\_3 - (\mathbf{x}\_2 - i \mathbf{y}\_2)(\mathbf{x}\_2 + i \mathbf{y}\_2) = \mathbf{x}\_1 \mathbf{x}\_3 - (\mathbf{x}\_2^2 + \mathbf{y}\_2^2) \\ &= \det(\tilde{X}^\*), \ \mathbf{x}\_1 > 0, \ \mathbf{x}\_3 > 0, \ \mathbf{x}\_1 \mathbf{x}\_3 - (\mathbf{x}\_2^2 + \mathbf{y}\_2^2) > 0, \end{aligned}$$

due to Hermitian positive definiteness of *X*˜ . As well,

$$|\det(\tilde{X})| = + [ (\det(\tilde{X})(\det(\tilde{X}^\*)))^{\frac{1}{2}} = x\_1 x\_3 - (x\_2^2 + y\_2^2) > 0.5$$

Note that tr*(X)*˜ = *x*<sup>1</sup> + *x*<sup>3</sup> and *Γ*˜ <sup>2</sup>*(α)* = *π* 2*(*1*)* <sup>2</sup> *Γ (α)Γ (α* − 1*), (α) >* 1*, p* = 2. The density is then of the following form:

$$\begin{split} f\_{1}(\tilde{X}) &= \frac{1}{\tilde{I}\_{2}(\alpha)} |\det(\tilde{X})|^{\alpha - 2} \mathbf{e}^{-\text{tr}(\tilde{X})} \\ &= \frac{1}{\pi \varGamma(\alpha)\varGamma(\alpha - 1)} [\mathbf{x}\_{1}\mathbf{x}\_{3} - (\mathbf{x}\_{2}^{2} + \mathbf{y}\_{2}^{2})]^{\alpha - 2} \mathbf{e}^{-(\mathbf{x}\_{1} + \mathbf{x}\_{3})} \end{split}$$

for *<sup>x</sup>*<sup>1</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*<sup>3</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*1*x*<sup>3</sup> <sup>−</sup> *(x*<sup>2</sup> <sup>2</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>2</sup> *) >* 0*, (α) >* 1, and *f*1*(X)*˜ = 0 elsewhere.

Now, consider a *p* × *p* parameter matrix *B>O* ˜ . We can obtain the following identity corresponding to the identity in the real case:

$$|\det(\tilde{B})|^{-\alpha} \equiv \frac{1}{\tilde{\Gamma}\_p(\alpha)} \int\_{\tilde{X} > O} |\det(\tilde{X})|^{\alpha - p} \mathbf{e}^{-\text{tr}(\tilde{B}\tilde{X})} \text{d}\tilde{X}, \ \mathfrak{M}(\alpha) > p - 1. \tag{5.2a.3}$$

A two-parameter gamma density in the complex domain can then be derived by proceeding as in the real case; it is given by

$$\tilde{f}(\tilde{X}) = \begin{cases} \frac{|\det(\tilde{B})|^a}{\tilde{I}\_P(a)} |\det(\tilde{X})|^{a-p} \mathbf{e}^{-\text{tr}(\tilde{B}\tilde{X})}, & \tilde{B} > O, \ \tilde{X} > O, \ \mathfrak{N}(a) > p - 1\\ 0, \text{ elsewhere.} \end{cases} \tag{5.2a.4}$$

#### **5.2a.1. The mgf of the complex matrix-variate gamma distribution**

The moment generating function in the complex domain is slightly different from that in the real case. Let *T >O* ˜ be a *p* × *p* parameter matrix and let *X*˜ be *p* × *p* twoparameter gamma distributed as in (5.2a.4). Then *T*˜ = *T*<sup>1</sup> + *iT*<sup>2</sup> and *X*˜ = *X*<sup>1</sup> + *iX*2, with *<sup>T</sup>*1*, T*2*, X*1*, X*<sup>2</sup> real and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. When *<sup>T</sup>*˜ and *<sup>X</sup>*˜ are Hermitian positive definite, *<sup>T</sup>*<sup>1</sup> and *X*<sup>1</sup> are real symmetric and *T*<sup>2</sup> and *X*<sup>2</sup> are real skew symmetric. Then consider

$$\text{tr}(T^\*X) = \text{tr}(T\_\!\_\text{l}X\_\text{l}) + \text{tr}(T\_\!\_\text{2}X\_\text{2}) + i[\text{tr}(T\_\!\_\text{l}X\_\text{2}) - \text{tr}(T\_\!\_\text{2}X\_\text{l})].$$

Note that tr*(T*1*X*1*)* + tr*(T*2*X*2*)* contains all the real variables involved multiplied by the corresponding parameters, where the diagonal elements appear once and the off-diagonal elements each appear twice. Thus, as in the real case, *T*˜ has to be replaced by <sup>∗</sup>*T*˜ = <sup>∗</sup>*T* <sup>1</sup> + *i*∗*T* 2. A term containing *i* still remains; however, as a result of the following properties, this term will disappear.

**Lemma 5.2a.1.** *Let T ,*˜ *X, T* ˜ <sup>1</sup>*, T*2*, X*1*, X*<sup>2</sup> *be as defined above. Then, tr(T*1*X*2*)* = 0*, tr(T*2*X*1*)* = 0*, tr(*∗*T*1*X*2*)* = 0*, tr(*∗*T*2*X*1*)* = 0*.*

*Proof***:** For any real square matrix *A*, tr*(A)* = tr*(A*- *)* and for any two matrices *A* and *B* where *AB* and *BA* are defined, tr*(AB)* = tr*(BA)*. With the help of these two results, we have the following:

$$\text{tr}(T\_1 X\_2) = \text{tr}(T\_1 X\_2)' = \text{tr}(X\_2' T\_1') = -\text{tr}(X\_2 T\_1) = -\text{tr}(T\_1 X\_2)$$

as *T*<sup>1</sup> is symmetric and *X*<sup>2</sup> is skew symmetric. Now, tr*(T*1*X*2*)* = −tr*(T*1*X*2*)* ⇒ tr*(T*1*X*2*)* = 0 since it is a real quantity. It can be similarly established that the other results stated in the lemma hold.

We may now define the mgf in the complex case, denoted by *MX*˜*(*∗*T )*, as follows:

$$\begin{split} M\_{\tilde{X}}(\tilde{\*}\tilde{T}) &= E[\mathbf{e}^{\mathrm{tr}(\_{\*}\tilde{T}^{\*}\tilde{X})}] = \int\_{\tilde{X} > O} \mathbf{e}^{\mathrm{tr}(\_{\*}\tilde{T}^{\*}\tilde{X})} \tilde{f}(\tilde{X}) \mathrm{d}\tilde{X} \\ &= \frac{|\det(\tilde{B})|^{\alpha}}{\tilde{I}\_{p}(\alpha)} \int\_{\tilde{X} > O} \mathbf{e}^{-\mathrm{tr}(\tilde{B} - \_{\*}\tilde{T}^{\*})\tilde{X}} \mathrm{d}\tilde{X} .\end{split}$$

Since tr*(X(*˜ *<sup>B</sup>*˜ <sup>−</sup> <sup>∗</sup>*T*˜ <sup>∗</sup> *))* <sup>=</sup> tr*(CXC*˜ <sup>∗</sup>*)* for *<sup>C</sup>* <sup>=</sup> *(B*˜ <sup>−</sup> <sup>∗</sup>*T*˜ <sup>∗</sup> *)* 1 <sup>2</sup> and *C>O*, it follows from Theorem 1:6a.5 that *Y*˜ = *CXC*˜ <sup>∗</sup> ⇒ d*Y*˜ = |det*(CC*∗*)*| *<sup>p</sup>* <sup>d</sup>*X*˜ , that is, d*X*˜ = |det*(B*˜ <sup>−</sup> <sup>∗</sup>*T*˜ <sup>∗</sup> *)*| <sup>−</sup>*<sup>p</sup>* <sup>d</sup>*Y*˜ for *<sup>B</sup>*˜ <sup>−</sup> <sup>∗</sup>*T*˜ <sup>∗</sup> *> O*. Then,

Matrix-Variate Gamma and Beta Distributions

$$\begin{split} M\_{\tilde{X}}(\ast\_{\*}\tilde{T}) &= \frac{|\det(\tilde{B})|^{\alpha}}{\tilde{\Gamma}\_{p}(\alpha)} |\det(\tilde{B} - \ast\_{\*}\tilde{T}^{\*})|^{-\alpha} \int\_{\tilde{Y} > O} |\det(\tilde{Y})|^{\alpha - p} \mathbf{e}^{-\text{tr}(\tilde{Y})} \mathbf{d}\tilde{Y} \\ &= |\det(\tilde{B})|^{\alpha} |\det(\tilde{B} - \ast\_{\*}\tilde{T}^{\*})|^{-\alpha} \text{ for } \tilde{B} - \ast\_{\*}\tilde{T}^{\*} > O \\ &= |\det(I - \tilde{B}^{-1} \ast\_{\*} \tilde{T}^{\*})|^{-\alpha}, \quad I - \tilde{B}^{-1} \ast\_{\*} \tilde{T}^{\*} > O. \end{split} \tag{5.2a.5}$$

For example, let *p* = 2 and

$$
\tilde{X} = \begin{bmatrix} \tilde{\chi}\_{11} & \tilde{\chi}\_{12} \\ \tilde{\chi}\_{12}^\* & \tilde{\chi}\_{22} \end{bmatrix}, \ B = \begin{bmatrix} 3 & i \\ -i & 2 \end{bmatrix} \text{ and } \ast\_\* T = \begin{bmatrix} \*\tilde{t}\_{11} & \*\tilde{t}\_{12} \\ \*\tilde{t}\_{12} & \*\tilde{t}\_{22} \end{bmatrix},
$$

with *x*˜<sup>21</sup> = ˜*x*<sup>∗</sup> <sup>12</sup> and <sup>∗</sup>*t* ˜<sup>21</sup> = <sup>∗</sup>*t* ˜ ∗ 12. In this case, the conjugate transpose is only the conjugate since the quantities are scalar. Note that *B* = *B*<sup>∗</sup> and hence *B* is Hermitian. The leading minors of *B* being |*(*3*)*| = 3 *>* 0 and |*B*| = *(*3*)(*2*)* − *(*−*i)(i)* = 5 *>* 0*, B* is Hermitian positive definite. Accordingly,

$$\begin{aligned} M\_{\tilde{X}}(\ast\_\* \tilde{T}) &= |\det(B)|^\alpha |\det(B - \ast\_\* \tilde{T}^\*)|^{-\alpha} \\ &= \mathcal{S}^\alpha [(3 - \ast\_\* \tilde{t}\_{11}^\*)(2 - \ast\_\* \tilde{t}\_{22}^\*) + (i + \ast\_\* \tilde{t}\_{12}^\*)(i - \ast\_\* \tilde{t}\_{12})]. \end{aligned}$$

Now, consider the partitioning of the following *p* × *p* matrices:

$$
\tilde{X} = \begin{bmatrix} \tilde{X}\_{11} & \tilde{X}\_{12} \\ \tilde{X}\_{21} & \tilde{X}\_{22} \end{bmatrix}, \ \ast \tilde{T} = \begin{bmatrix} \*\tilde{T}\_{11} & \*\tilde{T}\_{12} \\ \*\tilde{T}\_{21} & \*\tilde{T}\_{22} \end{bmatrix} \text{ and } \tilde{B} = \begin{bmatrix} \tilde{B}\_{11} & \tilde{B}\_{12} \\ \tilde{B}\_{21} & \tilde{B}\_{22} \end{bmatrix} \tag{i}
$$

where *X*˜ <sup>11</sup> and <sup>∗</sup>*T*˜ <sup>11</sup> are *r* × *r*, *r* ≤ *p*. Then, proceeding as in the real case, we have the following results:

**Theorem 5.2a.1.** *Let X*˜ *have a p* ×*p complex matrix-variate gamma density with shape parameter α and scale parameter Ip, and X*˜ *be partitioned as in (i). Then, X*˜ <sup>11</sup> *has an r* ×*r complex matrix-variate gamma density with shape parameter α and scale parameter Ir and X*˜ <sup>22</sup> *has a (p* − *r)* × *(p* − *r) complex matrix-variate gamma density with shape parameter α and scale parameter Ip*−*r.*

**Theorem 5.2a.2.** *Let the p* × *p complex matrix X*˜ *have a p* × *p complex matrix-variate gamma density with the parameters (α,B > O)* ˜ *and let X*˜ *and B*˜ *be partitioned as in (i) and B*˜<sup>12</sup> = *O, B*˜<sup>21</sup> = *O. Then X*˜ <sup>11</sup> *and X*˜ <sup>22</sup> *have r* × *r and (p* − *r)* × *(p* − *r) complex matrix-variate gamma densities with shape parameter α and scale parameters B*˜<sup>11</sup> *and B*˜22*, respectively.*

**Theorem 5.2a.3.** *Let X,* ˜ *X*˜ <sup>11</sup>*, X*˜ <sup>22</sup> *and B*˜ *be as specified in Theorems 5.2a.1 or 5.2a.2. Then, X*˜ <sup>11</sup> *and X*˜ <sup>22</sup> *are statistically independently distributed as complex matrix-variate gamma random variables on r* × *r and (p* − *r)* × *(p* − *r) matrices, respectively.*

For a general matrix *B* where the sub-matrices *B*<sup>12</sup> and *B*<sup>21</sup> are not assumed to be null, the marginal densities of *X*˜ <sup>11</sup> and *X*˜ <sup>22</sup> being given in the next result can be determined by proceeding as in the real case.

**Theorem 5.2a.4.** *Let X*˜ *have a complex matrix-variate gamma density with shape parameter α and scale parameter matrix B* = *B*<sup>∗</sup> *> O. Letting X*˜ *and B be partitioned as in (i), then the sub-matrix X*˜ <sup>11</sup> *has a complex matrix-variate gamma density with shape parameter <sup>α</sup> and scale parameter matrix <sup>B</sup>*<sup>11</sup> <sup>−</sup> *<sup>B</sup>*12*B*−<sup>1</sup> <sup>22</sup> *B*21*, and the sub-matrix X*˜ <sup>22</sup> *has a complex matrix-variate gamma density with shape parameter α and scale parameter matrix <sup>B</sup>*<sup>22</sup> <sup>−</sup> *<sup>B</sup>*21*B*−<sup>1</sup> <sup>11</sup> *B*12*.*

#### **Exercises 5.2**

**5.2.1.** Show that

$$
\pi^{\frac{(p-r)(p-r-1)}{4}} \pi^{\frac{tr(r-1)}{4}} \pi^{\frac{2r(p-r)}{4}} = \pi^{\frac{p(p-1)}{4}}.
$$

**5.2.2.** Show that *Γr(α)Γp*−*r(α* <sup>−</sup> *<sup>r</sup>* <sup>2</sup> *)* = *Γp(α).*

**5.2.3.** Evaluate (1): # *X>O* <sup>e</sup>−tr*(X)*d*X*, (2): # *X>O* <sup>|</sup>*X*<sup>|</sup> <sup>e</sup>−tr*(X)*d*X*.

**5.2.4.** Write down (1): *Γ*3*(α)*, (2): *Γ*4*(α)* explicitly in the real and complex cases.

**5.2.5.** Evaluate the integrals in Exercise 5.2.3 for the complex case. In (2) replace det*(X)* by |det*(X)*|.

#### **5.3. Matrix-variate Type-1 Beta and Type-2 Beta Densities, Real Case**

The *p* × *p* matrix-variate beta function denoted by *Bp(α, β)* is defined as follows in the real case:

$$B\_p(\alpha, \beta) = \frac{\Gamma\_p(\alpha)\Gamma\_p(\beta)}{\Gamma\_p(\alpha + \beta)}, \ \Re(\alpha) > \frac{p-1}{2}, \ \Re(\beta) > \frac{p-1}{2}. \tag{5.3.1}$$

This function has the following integral representations in the real case where it is assumed that *(α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> and *(β) > <sup>p</sup>*−<sup>1</sup> 2 :

Matrix-Variate Gamma and Beta Distributions

$$B\_p(\alpha, \beta) = \int\_{O < X < I} |X|^{\alpha - \frac{p+1}{2}} |I - X|^{\beta - \frac{p+1}{2}} dX, \text{ a type-1 beta integral} \tag{5.3.2}$$

$$B\_p(\beta, a) = \int\_{O < Y < I} |Y|^{\beta - \frac{p+1}{2}} |I - Y|^{a - \frac{p+1}{2}} \,\mathrm{d}Y, \text{ a type-1 beta integral} \tag{5.3.3}$$

$$B\_P(\alpha, \beta) = \int\_{Z>0} |Z|^{\alpha - \frac{p+1}{2}} |I + Z|^{-(\alpha + \beta)} dZ,\text{ a type-2 beta integral} \tag{5.3.4}$$

$$B\_p(\beta, a) = \int\_{T > 0} |T|^{\beta - \frac{p+1}{2}} |I + T|^{-(a + \beta)} \mathrm{d}T,\text{ a type-2 beta integral.} \tag{5.3.5}$$

For example, for *p* = 2, let

$$X = \begin{bmatrix} x\_{11} & x\_{12} \\ x\_{12} & x\_{22} \end{bmatrix} \Rightarrow |X| = x\_{11}x\_{22} - x\_{12}^2 \text{ and } |I - X| = (1 - x\_{11})(1 - x\_{22}) - x\_{12}^2.$$

Then for example, (5.3.2) will be of the following form:

$$\begin{split} B\_{p}(\boldsymbol{\alpha},\beta) &= \int\_{X>0} |X|^{\boldsymbol{\alpha}-\frac{p+1}{2}} |I-X|^{\beta-\frac{p+1}{2}} \mathrm{d}X \\ &= \int\_{\boldsymbol{x}\_{1}>0} \int\_{\boldsymbol{x}\_{22}>0} \int\_{\mathbf{x}\_{11}\mathbf{x}\_{22}-\mathbf{x}\_{12}^{2}>0} [\mathbf{x}\_{11}\mathbf{x}\_{22}-\mathbf{x}\_{12}^{2}]^{\boldsymbol{\alpha}-\frac{3}{2}} \\ &\qquad \times [(1-\mathbf{x}\_{11})(1-\mathbf{x}\_{22})-\mathbf{x}\_{12}^{2}]^{\beta-\frac{3}{2}} \mathrm{d}\mathbf{x}\_{11}\wedge\mathbf{d}\mathbf{x}\_{12}\wedge\mathbf{d}\mathbf{x}\_{22}. \end{split}$$

We will derive two of the integrals (5.3.2)–(5.3.5), the other ones being then directly obtained. Let us begin with the integral representations of *Γp(α)* and *Γp(β)* for *(α) > p*−1 <sup>2</sup> *, (β) > <sup>p</sup>*−<sup>1</sup> 2 :

$$\begin{split} \Gamma\_{p}(\alpha)\Gamma\_{p}(\beta) &= \left[ \int\_{X>O} |X|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\operatorname{tr}(X)} \, \mathrm{d}X \right] \Big[ \int\_{Y>O} |Y|^{\beta - \frac{p+1}{2}} \mathbf{e}^{-\operatorname{tr}(Y)} \mathrm{d}Y \Big] \\ &= \int\_{X} \int\_{Y} |X|^{\alpha - \frac{p+1}{2}} |Y|^{\beta - \frac{p+1}{2}} \mathbf{e}^{-\operatorname{tr}(X+Y)} \mathrm{d}X \wedge \mathrm{d}Y. \end{split}$$

Making the transformation *U* = *X* + *Y, X* = *V* , whose Jacobian is 1, taking out *U* from <sup>|</sup>*<sup>U</sup>* <sup>−</sup> *<sup>V</sup>* |=|*U*| |*<sup>I</sup>* <sup>−</sup> *<sup>U</sup>*−<sup>1</sup> 2*V U*−<sup>1</sup> <sup>2</sup> <sup>|</sup>, and then letting *<sup>W</sup>* <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> 2*V U*−<sup>1</sup> <sup>2</sup> ⇒ d*V* = |*U*| *p*+1 <sup>2</sup> d*W*, we have

$$\begin{split} \Gamma\_{P}(\alpha)\Gamma\_{P}(\beta) &= \int\_{U} \int\_{V} |V|^{\alpha - \frac{p+1}{2}} |U - V|^{\beta - \frac{p+1}{2}} \mathbf{e}^{-\operatorname{tr}(U)} \mathbf{d}U \wedge \mathbf{d}V \\ &= \left\{ \int\_{U > 0} |U|^{\alpha + \beta - \frac{p+1}{2}} \mathbf{e}^{-\operatorname{tr}(U)} \mathbf{d}U \right\} \\ &\quad \times \left\{ \int\_{O < W < I} |W|^{\alpha - \frac{p+1}{2}} |I - W|^{\beta - \frac{p+1}{2}} \mathbf{d}W \right\} \\ &= \Gamma\_{P}(\alpha + \beta) \int\_{O < W < I} |W|^{\alpha - \frac{p+1}{2}} |I - W|^{\beta - \frac{p+1}{2}} \mathbf{d}W. \end{split}$$

Thus, on dividing both sides by *Γp(α, β)*, we have

$$B\_p(\alpha, \beta) = \int\_{O < W < I} |W|^{\alpha - \frac{p+1}{2}} |I - W|^{\beta - \frac{p+1}{2}} \mathrm{d}W. \tag{i}$$

This establishes (5.3.2). The initial conditions *(α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> are sufficient to justify all the steps above, and hence no conditions are listed at each stage. Now, take *Y* = *I* −*W* to obtain (5.3.3). Let us take the *W* of *(i)* above and consider the transformation

$$Z = (I - W)^{-\frac{1}{2}} W (I - W)^{-\frac{1}{2}} = (W^{-1} - I)^{-\frac{1}{2}} (W^{-1} - I)^{-\frac{1}{2}} = (W^{-1} - I)^{-1}$$

which gives

$$Z^{-1} = W^{-1} - I \Rightarrow |Z|^{-(p+1)} \text{d}Z = |W|^{-(p+1)} \text{d}W.\tag{ii}$$

Taking determinants and substituting in *(ii)* we have

$$\mathrm{d}W = |I + Z|^{-(p+1)}\mathrm{d}Z.$$

On expressing *W, I* − *W* and d*W* in terms of *Z,* we have the result (5.3.4). Now, let *T* = *<sup>Z</sup>*−<sup>1</sup> with the Jacobian d*<sup>T</sup>* = |*Z*<sup>|</sup> −*(p*+1*)* d*Z*, then (5.3.4) transforms into the integral (5.3.5). These establish all four integral representations of the real matrix-variate beta function. We may also observe that *Bp(α, β)* = *Bp(β, α)* or *α* and *β* can be interchanged in the beta function. Consider the function

$$f\_{\mathfrak{I}}(X) = \frac{\Gamma\_p(\alpha + \beta)}{\Gamma\_p(\alpha)\Gamma\_p(\beta)} |X|^{\alpha - \frac{p+1}{2}} |I - X|^{\beta - \frac{p+1}{2}} \tag{5.3.6}$$

for *O < X < I, (α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and *f*3*(X)* = 0 elsewhere. This is a type-1 real matrix-variate beta density with the parameters *(α, β)*, where *O<X<I* means *X > O, I* − *X>O* so that all the eigenvalues of *X* are in the open interval *(*0*,* 1*)*. As for

$$f\_4(Z) = \frac{\Gamma\_p(\alpha + \beta)}{\Gamma\_p(\alpha)\Gamma\_p(\beta)} |Z|^{\alpha - \frac{p+1}{2}} |I + Z|^{-(\alpha + \beta)} \tag{5.3.7}$$

whenever *Z > O, (α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and *f*4*(Z)* = 0 elsewhere, this is a *p* × *p* real matrix-variate type-2 beta density with the parameters *(α, β)*.

#### **5.3.1. Some properties of real matrix-variate type-1 and type-2 beta densities**

In the course of the above derivations, it was shown that the following results hold. If *X* is a *p* × *p* real positive definite matrix having a real matrix-variate type-1 beta density with the parameters *(α, β)*, then:

(1): *Y*<sup>1</sup> = *I* − *X* is real type-1 beta distributed with the parameters *(β, α)*;

(2): *<sup>Y</sup>*<sup>2</sup> <sup>=</sup> *(I* <sup>−</sup> *X)*−<sup>1</sup> <sup>2</sup>*X(I* <sup>−</sup> *X)*−<sup>1</sup> <sup>2</sup> is real type-2 beta distributed with the parameters *(α, β)*;

(3): *Y*<sup>3</sup> = *(I* −*X)* 1 <sup>2</sup>*X*−1*(I* <sup>−</sup>*X)* 1 <sup>2</sup> is real type-2 beta distributed with the parameters *(β, α)*. If *Y* is real type-2 beta distributed with the parameters *(α, β)* then:

(4): *<sup>Z</sup>*<sup>1</sup> <sup>=</sup> *<sup>Y</sup>* <sup>−</sup><sup>1</sup> is real type-2 beta distributed with the parameters *(β, α)*;

(5): *<sup>Z</sup>*<sup>2</sup> <sup>=</sup> *(I* <sup>+</sup>*Y )*−<sup>1</sup> <sup>2</sup> *Y (I* <sup>+</sup>*Y )*−<sup>1</sup> <sup>2</sup> is real type-1 beta distributed with the parameters *(α, β)*; (6): *<sup>Z</sup>*<sup>3</sup> <sup>=</sup> *<sup>I</sup>* <sup>−</sup> *(I* <sup>+</sup> *Y )*−<sup>1</sup> <sup>2</sup> *Y (I* <sup>+</sup> *Y )*−<sup>1</sup> <sup>2</sup> <sup>=</sup> *(I* <sup>+</sup> *Y )*−<sup>1</sup> is real type-1 beta distributed with the parameters *(β, α)*.

#### **5.3a. Matrix-variate Type-1 and Type-2 Beta Densities, Complex Case**

A matrix-variate beta function in the complex domain is defined as

$$\tilde{B}\_p(\alpha, \beta) = \frac{\tilde{\Gamma}\_p(\alpha)\tilde{\Gamma}\_p(\beta)}{\tilde{\Gamma}\_p(\alpha + \beta)}, \ \Re(\alpha) > p - 1, \ \Re(\beta) > p - 1 \tag{5.3a.1}$$

with a tilde over *B*. As *B*˜*p(α, β)* = *B*˜*p(β, α),* clearly *α* and *β* can be interchanged. Then, *B*˜*p(α, β)* has the following integral representations, where *(α) > p* − 1*, (β) > p* − 1:

$$\tilde{B}\_p(\alpha, \beta) = \int\_{O < \tilde{X} < I} |\det(\tilde{X})|^{\alpha - p} |\det(I - \tilde{X})|^{\beta - p} d\tilde{X}, \text{ a type-1 beta integral} \tag{5.3a.2}$$

$$\tilde{B}\_p(\beta, \alpha) = \int\_{O < \tilde{Y} < I} |\det(\tilde{Y})|^{\beta - p} |\det(I - \tilde{Y})|^{\alpha - p} d\tilde{Y}, \text{ a type-1 beta integral} \tag{5.3a.3}$$

$$\tilde{B}\_p(\mathbf{a}, \beta) = \int\_{\tilde{Z} > 0} |\det(\tilde{Z})|^{a-p} |\det(I + \tilde{Z})|^{-(a+\beta)} d\tilde{Z}, \text{ a type-2 beta integral} \quad (5.3a.4)$$

$$\tilde{B}\_p(\beta, \alpha) = \int\_{\tilde{T} > 0} |\det(\tilde{T})|^{\beta - p} |\det(I + \tilde{T})|^{-(\alpha + \beta)} d\tilde{T}, \text{ a type-2 beta integral.} \tag{5.3a.5}$$

For instance, consider the integrand in (5.3a.2) for the case *p* = 2. Let

$$
\tilde{X} = \begin{bmatrix} x\_1 & x\_2 + iy\_2 \\ x\_2 - iy\_2 & x\_3 \end{bmatrix}, \ \tilde{X} = \tilde{X}^\* > O, \ i = \sqrt{(-1)}, \
$$

the diagonal elements of *X*˜ being real; since *X*˜ is Hermitian positive definite, we have *<sup>x</sup>*<sup>1</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*<sup>3</sup> *<sup>&</sup>gt;* <sup>0</sup>*, x*1*x*<sup>2</sup> <sup>−</sup>*(x*<sup>2</sup> <sup>2</sup> <sup>+</sup>*y*<sup>2</sup> <sup>2</sup> *) >* 0, det*(X)*˜ <sup>=</sup> *<sup>x</sup>*1*x*<sup>3</sup> <sup>−</sup>*(x*<sup>2</sup> <sup>2</sup> <sup>+</sup>*y*<sup>2</sup> <sup>2</sup> *) >* 0 and det*(I* −*X)*˜ = *(*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*1*)(*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*3*)* <sup>−</sup> *(x*<sup>2</sup> <sup>2</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>2</sup> *) >* 0. The integrand in (5.3a.2) is then

$$[(\mathbf{x}\_1 \mathbf{x}\_3 - (\mathbf{x}\_2^2 + \mathbf{y}\_2^2))]^{a - \frac{3}{2}} [(1 - \mathbf{x}\_1)(1 - \mathbf{x}\_3) - (\mathbf{x}\_2^2 + \mathbf{y}\_2^2)]^{\beta - \frac{3}{2}}.$$

The derivations of (5.3a.2)–(5.3a.5) being parallel to those provided in the real case, they are omitted. We will list one case for each of a type-1 and a type-2 beta density in the complex *p* × *p* matrix-variate case:

$$\tilde{f}\_{\mathfrak{I}}(\tilde{X}) = \frac{\bar{\varGamma}\_{p}(\alpha + \beta)}{\tilde{\varGamma}\_{p}(\alpha)\tilde{\varGamma}\_{p}(\beta)} |\det(\tilde{X})|^{\alpha - p} |\det(I - \tilde{X})|^{\beta - p} \tag{5.3a.6}$$

for *O < X < I,* ˜ *(α) > p* − 1*, (β) > p* − 1 and *f*˜ <sup>3</sup>*(X)*˜ = 0 elsewhere;

$$\tilde{f}\_4(\tilde{Z}) = \frac{\bar{\Gamma}\_p(\alpha + \beta)}{\tilde{\Gamma}\_p(\alpha)\tilde{\Gamma}\_p(\beta)} |\det(\tilde{Z})|^{\alpha - p} |\det(I + \tilde{Z})|^{-(\alpha + \beta)} \tag{5.3a.7}$$

for *Z > O,* ˜ *(α) > p* − 1*, (β) > p* − 1 and *f*˜ <sup>4</sup>*(Z)*˜ = 0 elsewhere.

Properties parallel to (1) to (6) which are listed in Sect. 5.3.1 also hold in the complex case.

#### **5.3.2. Explicit evaluation of type-1 matrix-variate beta integrals, real case**

A detailed evaluation of a type-1 matrix-variate beta integral as a multiple integral is presented in this section as the steps will prove useful in connection with other computations; the reader may also refer to Mathai (2014,b). The real matrix-variate type-1 beta function which is denoted by

$$B\_p(\alpha, \beta) = \frac{\Gamma\_p(\alpha)\Gamma\_p(\beta)}{\Gamma\_p(\alpha + \beta)}, \ \Re(\alpha) > \frac{p-1}{2}, \ \Re(\beta) > \frac{p-1}{2},$$

has the following type-1 beta integral representation:

$$B\_p(\alpha, \beta) = \int\_{O < X < I} |X|^{\alpha - \frac{p+1}{2}} |I - X|^{\beta - \frac{p+1}{2}} dX,$$

for *(α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> where *X* is a real *p* × *p* symmetric positive definite matrix. The standard derivation of this integral relies on the properties of real matrix-variate gamma integrals after making suitable transformations, as was previously done. It is also possible to evaluate the integral directly and show that it is equal to *Γp(α)Γp(β)/Γp(α* + *β)* where, for example,

$$\Gamma\_p(\alpha) = \pi^{\frac{p(p-1)}{4}} \Gamma(\alpha) \Gamma(\alpha - 1/2) \cdots \Gamma(\alpha - (p-1)/2), \; \Re(\alpha) > \frac{p-1}{2}.$$

A convenient technique for evaluating a real matrix-variate gamma integral consists of making the transformation *X* = *T T* where *T* is a lower triangular matrix whose diagonal elements are positive. However, on applying this transformation, the type-1 beta integral does not simplify due to the presence of the factor |*I* − *X*| *<sup>β</sup>*−*p*+<sup>1</sup> <sup>2</sup> . Hence, we will attempt to evaluate this integral by appropriately partitioning the matrices and then, successively integrating out the variables. Letting *X* = *(xij )* be a *p* × *p* real matrix, *xpp* can then be extracted from the determinants of |*X*| and |*I* − *X*| after partitioning the matrices. Thus, let

$$X = \begin{bmatrix} X\_{11} & X\_{12} \\ X\_{21} & X\_{22} \end{bmatrix}$$

where *X*<sup>11</sup> is the *(p* − 1*)* × *(p* − 1*)* leading sub-matrix, *X*<sup>21</sup> is 1 × *(p* − 1*)*, *X*<sup>22</sup> = *xpp* and *X*<sup>12</sup> = *X*- <sup>21</sup>. Then <sup>|</sup>*X*|=|*X*11||*xpp* <sup>−</sup> *<sup>X</sup>*21*X*−<sup>1</sup> <sup>11</sup> *X*12| so that

$$|X|^{
\alpha - \frac{p+1}{2}} = |X\_{11}|^{
\alpha - \frac{p+1}{2}} [x\_{pp} - X\_{21} X\_{11}^{-1} X\_{12}]^{
\alpha - \frac{p+1}{2}},\tag{i}$$

and

$$|I - X|^{\\\\\beta - \frac{p+1}{2}} = |I - X\_{11}|^{\\\beta - \frac{p+1}{2}}[(1 - x\_{pp}) - X\_{21}(I - X\_{11})^{-1}X\_{12}]^{\\\beta - \frac{p+1}{2}}.\tag{ii}$$

It follows from *(i)* that *xpp > X*21*X*−<sup>1</sup> <sup>11</sup> *X*<sup>12</sup> and, from *(ii)* that *xpp <* 1 − *X*21*(I* − *X*11*)*−1*X*12; thus, we have *X*21*X*−<sup>1</sup> <sup>11</sup> *<sup>X</sup>*<sup>12</sup> *< xpp <sup>&</sup>lt;* <sup>1</sup> <sup>−</sup> *<sup>X</sup>*21*(I* <sup>−</sup> *<sup>X</sup>*11*)*−1*X*12. Let *<sup>y</sup>* <sup>=</sup> *xpp* <sup>−</sup> *<sup>X</sup>*21*X*−<sup>1</sup> <sup>11</sup> *X*<sup>12</sup> ⇒ d*y* = d*xpp* for fixed *X*21*, X*11, so that 0 *<y<b* where

$$\begin{split} b &= 1 - X\_{21}X\_{11}^{-1}X\_{12} - X\_{21}(I - X\_{11})^{-1}X\_{12} \\ &= 1 - X\_{21}X\_{11}^{-\frac{1}{2}}(I - X\_{11})^{-\frac{1}{2}}(I - X\_{11})^{-\frac{1}{2}}X\_{11}^{-\frac{1}{2}}X\_{12} \\ &= 1 - WW', \ W = X\_{21}X\_{11}^{-\frac{1}{2}}(I - X\_{11})^{-\frac{1}{2}}. \end{split}$$

The second factor on the right-hand side of *(ii)* then becomes

$$\left[ [b-\mathbf{y}]^{\beta-\frac{p+1}{2}} = b^{\beta-\frac{p+1}{2}} \left[ 1-\mathbf{y}/b \right]^{\beta-\frac{p+1}{2}} \right.$$

Now letting *<sup>u</sup>* <sup>=</sup> *<sup>y</sup> <sup>b</sup>* for fixed *<sup>b</sup>*, the terms containing *<sup>u</sup>* and *<sup>b</sup>* become *<sup>b</sup>α*+*β*−*(p*+1*)*+1*uα*−*p*+<sup>1</sup> 2 *(*<sup>1</sup> <sup>−</sup> *u)β*−*p*+<sup>1</sup> <sup>2</sup> . Integration over *u* then gives

$$\int\_0^1 u^{\alpha - \frac{p+1}{2}} (1-u)^{\beta - \frac{p+1}{2}} \, \mathrm{d}u = \frac{\Gamma(\alpha - \frac{p-1}{2}) \Gamma(\beta - \frac{p-1}{2})}{\Gamma(\alpha + \beta - (p-1))}$$

for *(α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *.* Letting *<sup>W</sup>* <sup>=</sup> *<sup>X</sup>*21*X*−<sup>1</sup> 2 <sup>11</sup> *(I* − *X*11*)* −1 <sup>2</sup> for fixed *X*11, d*X*<sup>21</sup> = |*X*11| 1 <sup>2</sup> |*I* − *X*11| 1 <sup>2</sup> d*W* from Theorem 1.6.1 of Chap. 1 or Theorem 1.18 of Mathai (1997), where *X*<sup>11</sup> is a *(p* − 1*)* × *(p* − 1*)* matrix. Now, letting *v* = *WW* and integrating out over the Stiefel manifold by applying Theorem 4.2.3 of Chap. 4 or Theorem 2.16 and Remark 2.13 of Mathai (1997), we have

$$\mathrm{d}W = \frac{\pi^{\frac{p-1}{2}}}{\Gamma(\frac{p-1}{2})} v^{\frac{p-1}{2}-1} \mathrm{d}v.$$

Thus, the integral over *b* becomes

$$\begin{aligned} \int \mathbf{b}^{\alpha+\beta-p} \mathbf{d} \mathbf{X}\_{21} &= \int\_0^1 v^{\frac{p-1}{2}-1} (1-v)^{\alpha+\beta-p} \mathbf{d}v \\ &= \frac{\Gamma(\frac{p-1}{2}) \Gamma(\alpha+\beta-(p-1))}{\Gamma(\alpha+\beta-\frac{p-1}{2})}, \; \Re(\alpha+\beta) > p-1. \end{aligned}$$

Then, on multiplying all the factors together, we have

$$\frac{1}{2}|X\_{11}^{(1)}|^{\alpha+\frac{1}{2}-\frac{p+1}{2}}|I-X\_{11}^{(1)}|^{\beta+\frac{1}{2}-\frac{p+1}{2}}\pi^{\frac{p-1}{2}}\frac{\Gamma(\alpha-\frac{p-1}{2})\Gamma(\beta-\frac{p-1}{2})}{\Gamma(\alpha+\beta-\frac{p-1}{2})}$$

whenever *(α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . In this case, *<sup>X</sup>(*1*)* <sup>11</sup> represents the *(p* − 1*)* × *(p* − 1*)* leading sub-matrix at the end of the first set of operations. At the end of the second set of operations, we will denote the *(p* <sup>−</sup> <sup>2</sup>*)* <sup>×</sup> *(p* <sup>−</sup> <sup>2</sup>*)* leading sub-matrix by *<sup>X</sup>(*2*)* <sup>11</sup> , and so on. The second step of the operations begins by extracting *xp*−1*,p*−<sup>1</sup> and writing

$$|X\_{11}^{(1)}| = |X\_{11}^{(2)}| \ [x\_{p-1,p-1} - X\_{21}^{(2)} [X\_{11}^{(2)}]^{-1} X\_{12}^{(2)}]^{-1}$$

where *X(*2*)* <sup>21</sup> is a 1 × *(p* − 2*)* vector. We then proceed as in the first sequence of steps to obtain the final factors in the following form:

$$|X\_{11}^{(2)}|^{
\alpha+1-\frac{p+1}{2}} |I - X\_{11}^{(2)}|^{
\beta+1-\frac{p+1}{2}} \pi^{\frac{p-2}{2}} \frac{\Gamma(\alpha - \frac{p-2}{2})\Gamma(\beta - \frac{p-2}{2})}{\Gamma(\alpha + \beta - \frac{p-2}{2})}$$

for *(α) > <sup>p</sup>*−<sup>2</sup> <sup>2</sup> *, (β) > <sup>p</sup>*−<sup>2</sup> <sup>2</sup> . Proceeding in such a manner, in the end, the exponent of *π* will be

$$\frac{p-1}{2} + \frac{p-2}{2} + \dots + \frac{1}{2} = \frac{p(p-1)}{4},$$

and the gamma product will be

$$\frac{\Gamma(\alpha - \frac{p-1}{2})\Gamma(\alpha - \frac{p-2}{2})\cdots\Gamma(\alpha)\Gamma(\beta - \frac{p-1}{2})\cdots\Gamma(\beta)}{\Gamma(\alpha + \beta - \frac{p-1}{2})\cdots\Gamma(\alpha + \beta)}.$$

These gamma products, along with *π p(p*−1*)* <sup>4</sup> *,* can be written as *Γp(α)Γp(β) Γp(α*+*β)* <sup>=</sup> *Bp(α, β)*; hence the result. It is thus possible to obtain the beta function in the real matrix-variate case by direct evaluation of a type-1 real matrix-variate beta integral.

A similar approach can yield the real matrix-variate beta function from a type-2 real matrix-variate beta integral of the form

$$\int\_{X>O} |X|^{\alpha - \frac{p+1}{2}} |I + X|^{-(\alpha + \beta)} \mathrm{d}X$$

where *<sup>X</sup>* is a *<sup>p</sup>* <sup>×</sup> *<sup>p</sup>* positive definite symmetric matrix and it is assumed that *(α) > <sup>p</sup>*−<sup>1</sup> 2 and *(β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , the evaluation procedure being parallel.

**Example 5.3.1.** By direct evaluation as a multiple integral, show that

$$\int\_{X>O} |X|^{\alpha - \frac{p+1}{2}} |I - X|^{\beta - \frac{p+1}{2}} \mathrm{d}X = \frac{\Gamma\_p(\alpha)\Gamma\_p(\beta)}{\Gamma\_p(\alpha + \beta)}$$

for *p* = 2.

**Solution 5.3.1.** The integral to be evaluated will be denoted by *δ*. Let

$$|X| = \mathbf{x}\_{11}[\mathbf{x}\_{22} - \mathbf{x}\_{21}\mathbf{x}\_{11}^{-1}\mathbf{x}\_{12}] = \mathbf{x}\_{11}[\mathbf{x}\_{22} - \frac{\mathbf{x}\_{12}^2}{\mathbf{x}\_{11}}]$$

$$|I - X| = [1 - \mathbf{x}\_{11}][(1 - \mathbf{x}\_{22}) - \mathbf{x}\_{12}(1 - \mathbf{x}\_{11})^{-1}\mathbf{x}\_{12}] = (1 - \mathbf{x}\_{11})\left[1 - \mathbf{x}\_{22} - \frac{\mathbf{x}\_{12}^2}{1 - \mathbf{x}\_{11}}\right].\tag{6}$$

It is seen from *(i)* that

$$\frac{\mathbf{x}\_{12}^2}{\mathbf{x}\_{11}} \le \mathbf{x}\_{22} \le 1 - \frac{\mathbf{x}\_{12}^2}{1 - \mathbf{x}\_{11}}.$$

Letting *<sup>y</sup>* <sup>=</sup> *<sup>x</sup>*<sup>22</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> 12 *x*11 so that 0 <sup>≤</sup> *<sup>y</sup>* <sup>≤</sup> *b,* and *<sup>b</sup>* <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> 12 *<sup>x</sup>*<sup>11</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> 12 <sup>1</sup>−*x*<sup>11</sup> <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> 12 *<sup>x</sup>*11*(*1−*x*11*),* we have

$$\begin{split} |X|^{\alpha - \frac{3}{2}} |I - X|^{\beta - \frac{3}{2}} \mathrm{d}X &= \mathsf{x}\_{11}^{\alpha - \frac{3}{2}} (1 - \mathsf{x}\_{11})^{\beta - \frac{3}{2}} \mathsf{y}^{\alpha - \frac{3}{2}} \\ &\quad \times (b - \mathsf{y})^{\beta - \frac{3}{2}} \mathrm{d}\mathsf{x}\_{11} \wedge \mathrm{d}\mathsf{x}\_{22} \wedge \mathrm{d}\mathsf{y} .\end{split}$$

Now, integrating out *y*, we have

$$\int\_{\mathbf{y}=0}^{b} \mathbf{y}^{\alpha - \frac{3}{2}} (b - \mathbf{y})^{\beta - \frac{3}{2}} \mathbf{d}\mathbf{y} = b^{\alpha + \beta - 3 + 1} \int\_{0}^{1} v^{\alpha - \frac{3}{2}} (1 - v)^{\beta - \frac{3}{2}} \, \mathrm{d}v, \, v = \frac{\mathbf{y}}{b}$$

$$= b^{\alpha + \beta - 2} \frac{\Gamma(\alpha - \frac{1}{2}) \Gamma(\beta - \frac{1}{2})}{\Gamma(\alpha + \beta - 1)} \tag{ii}$$

whenever *(α) >* <sup>1</sup> <sup>2</sup> and *(β) >* <sup>1</sup> <sup>2</sup> *, b* being as previously defined. Letting *<sup>w</sup>* <sup>=</sup> *<sup>x</sup>*<sup>12</sup> [*x(*1−*x)*] 1 2 , d*x*<sup>12</sup> = [*x*11*(*1 − *x*11] 1 <sup>2</sup> d*w* for fixed *x*11. The exponents of *x*<sup>11</sup> and *(*1 − *x*11*)* then become *<sup>α</sup>* <sup>−</sup> <sup>3</sup> <sup>2</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> and *<sup>β</sup>* <sup>−</sup> <sup>3</sup> <sup>2</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> , and the integral over *w* gives the following:

$$\begin{split} \int\_{-1}^{1} (1 - w^2)^{\alpha + \beta - 2} \mathrm{d}w &= 2 \int\_{0}^{1} (1 - w^2)^{\alpha + \beta - 2} \mathrm{d}w = \int\_{0}^{1} z^{\frac{1}{2} - 1} (1 - z)^{\alpha + \beta - 2} \mathrm{d}z \\ &= \frac{\Gamma(\frac{1}{2}) \Gamma(\alpha + \beta - 1)}{\Gamma(\alpha + \beta - \frac{1}{2})} . \end{split} \tag{iii}$$

Now, integrating out *x*11, we obtain

$$\int\_0^1 \mathbf{x}\_{11}^{\alpha - 1} (1 - \mathbf{x}\_{11})^{\beta - 1} \mathbf{dx}\_{11} = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)}.\tag{i\nu}$$

Then, on collecting the factors from *(i)* to *(iv),* we have

$$\delta = \Gamma(1/2) \frac{\Gamma(\alpha)\Gamma(\alpha - \frac{1}{2})\Gamma(\beta)\Gamma(\beta - \frac{1}{2})}{\Gamma(\alpha + \beta)\Gamma(\alpha + \beta - \frac{1}{2})}.$$

Finally, noting that for *p* = 2*, π p(p*−1*)* <sup>4</sup> = *π* 1 <sup>2</sup> <sup>=</sup> *<sup>π</sup>* <sup>1</sup> <sup>2</sup> *<sup>π</sup>* <sup>1</sup> 2 *π* 1 2 , the desired result is obtained, that is,

$$\delta = \frac{\Gamma\_2(\alpha)\Gamma\_2(\beta)}{\Gamma\_2(\alpha+\beta)} = B\_2(\alpha,\beta).$$

This completes the computations.

#### **5.3a.1. Evaluation of matrix-variate type-1 beta integrals, complex case**

The integral representation for *Bp(α, β)* in the complex case is

$$\int\_{O < \tilde{X} < I} |\det(\tilde{X})|^{\alpha - p} |\det(I - \tilde{X})|^{\beta - p} d\tilde{X} = \tilde{B}\_p(\alpha, \beta).$$

whenever *(α) > p* − 1*, (β) > p* − 1 where det*(*·*)* denotes the determinant of *(*·*)* and |det*(*·*)*|*,* the absolute value (or modulus) of the determinant of *(*·*)*. In this case, *X*˜ = *(x*˜*ij )* is a *p* × *p* Hermitian positive definite matrix and accordingly, all of its diagonal elements are real and positive. As in the real case, let us extract *xpp* by partitioning *X*˜ as follows:

$$
\tilde{X} = \begin{bmatrix} \tilde{X}\_{11} & \tilde{X}\_{12} \\ \tilde{X}\_{21} & \tilde{X}\_{22} \end{bmatrix} \text{ so that } \ I - \tilde{X} = \begin{bmatrix} I - \tilde{X}\_{11} & -\tilde{X}\_{12} \\ -\tilde{X}\_{21} & I - \tilde{X}\_{22} \end{bmatrix},
$$

where *X*˜ <sup>22</sup> ≡ *xpp* is a real scalar. Then, the absolute value of the determinants have the following representations:

$$|\det(\tilde{X})|^{\alpha-p} = |\det(\tilde{X}\_{11})|^{\alpha-p} |\mathbf{x}\_{pp} - \tilde{X}\_{21}\tilde{X}\_{11}^{-1}\tilde{X}\_{12}^{\*}|^{\alpha-p} \tag{i}$$

where \* indicates conjugate transpose, and

$$|\det(I-\tilde{X})|^{\beta-p} = |\det(I-\tilde{X}\_{11})|^{\beta-p}|(1-\mathbf{x}\_{pp}) - \tilde{X}\_{21}(I-\tilde{X}\_{11})^{-1}\tilde{X}\_{12}^\*|^{\beta-p}.\tag{iib}$$

Note that whenever *<sup>X</sup>*˜ and *<sup>I</sup>* <sup>−</sup> *<sup>X</sup>*˜ are Hermitian positive definite, *<sup>X</sup>*˜ <sup>−</sup><sup>1</sup> <sup>11</sup> and *(I* <sup>−</sup> *<sup>X</sup>*˜ <sup>11</sup>*)*−<sup>1</sup> are too Hermitian positive definite. Further, the Hermitian forms *X*˜ <sup>21</sup>*X*˜ <sup>−</sup><sup>1</sup> <sup>11</sup> *X*˜ <sup>∗</sup> <sup>12</sup> and *X*˜ <sup>21</sup>*(I* − *X*˜ <sup>11</sup>*)*−1*X*˜ <sup>∗</sup> <sup>12</sup> remain real and positive. It follows from *(i)* and *(ii)* that

$$
\tilde{X}\_{21}\tilde{X}\_{11}^{-1}\tilde{X}\_{12}^\* < \mathfrak{x}\_{pp} \\
< 1 - \tilde{X}\_{21}(I - \tilde{X}\_{11})^{-1}\tilde{X}\_{12}^\*.
$$

Since the traces of Hermitian forms are real, the lower and upper bounds of *xpp* are real as well. Let

$$
\tilde{W} = \tilde{X}\_{21} \tilde{X}\_{11}^{-\frac{1}{2}} (I - \tilde{X}\_{11})^{-\frac{1}{2}}
$$

for fixed *X*˜ 11. Then

$$\mathrm{d}\tilde{X}\_{21} = |\det(\tilde{X}\_{11})|^{-1} |\det(I - \tilde{X}\_{11})|^{-1} \mathrm{d}\tilde{W}$$

and |det*(X)*˜ | *<sup>α</sup>*−*p,* <sup>|</sup>det*(I* <sup>−</sup>*X*˜ <sup>11</sup>*)*<sup>|</sup> *<sup>β</sup>*−*<sup>p</sup>* will become <sup>|</sup>det*(X*˜ <sup>11</sup>*)*<sup>|</sup> *<sup>α</sup>*+1−*p,* <sup>|</sup> det*(I* <sup>−</sup>*X*˜ <sup>11</sup>*)*<sup>|</sup> *<sup>β</sup>*+1−*p,* respectively. Then, we can write

$$\begin{split} |(1 - \chi\_{pp}) - \tilde{X}\_{21}\tilde{X}\_{11}^{-1}\tilde{X}\_{12}^{\*} - \tilde{X}\_{21}(I - \tilde{X}\_{11})^{-1}\tilde{X}\_{12}^{\*}|^{\beta - p} \\ = (b - \mathbf{y})^{\beta - p} = b^{\beta - p}[1 - \mathbf{y}/b^{\
\pfrown}] .\end{split}$$

Now, letting *<sup>u</sup>* <sup>=</sup> *y/b*, the factors containing *<sup>u</sup>* and *<sup>b</sup>* will be of the form *<sup>u</sup>α*−*<sup>p</sup> (*<sup>1</sup> <sup>−</sup> *u)β*−*pbα*+*β*−2*p*+1; the integral over *u* then gives

$$\int\_0^1 u^{\alpha - p} (1 - u)^{\beta - p} \mathrm{d}u = \frac{\Gamma(\alpha - (p - 1)) \Gamma(\beta - (p - 1))}{\Gamma(\alpha + \beta - 2(p - 1))},$$

for *(α) > p* − 1*, (β) > p* − 1. Letting *v* = *W*˜ *W*˜ <sup>∗</sup> and integrating out over the Stiefel manifold by making use of Theorem 4.2a.3 of Chap. 4 or Corollaries 4.5.2 and 4.5.3 of Mathai (1997), we have

$$\mathrm{d}\tilde{W} = \frac{\pi^{p-1}}{\Gamma(p-1)} v^{(p-1)-1} \mathrm{d}v.$$

The integral over *b* gives

$$\begin{aligned} \int b^{\alpha+\beta-2p+1} \mathrm{d}\tilde{X}\_{21} &= \int\_0^1 v^{(p-1)-1} (1-v)^{\alpha+\beta-2p+1} \mathrm{d}v \\ &= \frac{\Gamma(p-1)\Gamma(\alpha+\beta-2(p-1))}{\Gamma(\alpha+\beta-p+1)}, \end{aligned}$$

for *(α) > p* − 1*, (β) > p* − 1. Now, taking the product of all the factors yields

$$|\det(\tilde{X}\_{11})|^{\alpha+1-p}|\det(I-\tilde{X}\_{11})|^{\beta+1-p}\pi^{p-1}\frac{\Gamma(\alpha-p+1)\Gamma(\beta-p+1)}{\Gamma(\alpha+\beta-p+1)}$$

for *(α) > p* − 1*, (β) > p* − 1. On extracting *xp*−1*,p*−<sup>1</sup> from |*X*˜ <sup>11</sup>| and |*I* − *X*˜ <sup>11</sup>| and continuing this process, in the end, the exponent of *<sup>π</sup>* will be *(p*−1*)*+*(p*−2*)*+···+<sup>1</sup> <sup>=</sup> *p(p*−1*)* <sup>2</sup> and the gamma product will be

$$\frac{\Gamma(\alpha - (p - 1))\Gamma(\alpha - (p - 2)) \cdots \Gamma(\alpha)\Gamma(\beta - (p - 1)) \cdots \Gamma(\beta)}{\Gamma(\alpha + \beta - (p - 1)) \cdots \Gamma(\alpha + \beta)}.$$

These factors, along with *π p(p*−1*)* <sup>2</sup> give

$$\frac{\tilde{\Gamma}\_p(\alpha)\tilde{\Gamma}\_p(\beta)}{\tilde{\Gamma}\_p(\alpha+\beta)} = \tilde{\mathcal{B}}\_p(\alpha,\ \beta), \ \Re(\alpha) > p - 1, \ \Re(\beta) > p - 1.$$

The procedure for evaluating a type-2 matrix-variate beta integral by partitioning matrices is parallel and hence will not be detailed here.

**Example 5.3a.1.** For *p* = 2*,* evaluate the integral

$$\int\_{O < \tilde{X} < I} |\det(\tilde{X})|^{\alpha - p} |\det(I - \tilde{X})|^{\beta - p} \mathrm{d}\tilde{X}$$

as a multiple integral and show that it evaluates out to *B*˜2*(α, β)*, the beta function in the complex domain.

Matrix-Variate Gamma and Beta Distributions

**Solution 5.3a.1.** For *p* = 2, *π p(p*−1*)* <sup>2</sup> = *π* 2*(*1*)* <sup>2</sup> = *π*, and

$$\tilde{B}\_2(\alpha, \beta) = \pi \frac{\Gamma(\alpha)\Gamma(\alpha - 1)\Gamma(\beta)\Gamma(\beta - 1)}{\Gamma(\alpha + \beta)\Gamma(\alpha + \beta - 1)}$$

whenever *(α) >* 1 and *(β) >* 1. For *p* = 2*,* our matrix and the relevant determinants are 

$$
\tilde{X} = \begin{bmatrix} \tilde{x}\_{11} & \tilde{x}\_{12} \\ \tilde{X}\_{12}^\* & \tilde{X}\_{22} \end{bmatrix}, \ |\det(\tilde{X})| \text{ and } |\det(I - \tilde{X})| \text{ }$$

where *x*˜<sup>∗</sup> <sup>12</sup> is only the conjugate of *x*˜<sup>12</sup> as it is a scalar quantity. By expanding the determinants of the partitioned matrices as explained in Sect. 1.3, we have the following:

$$\det(\tilde{X}) = \tilde{x}\_{11} [\tilde{x}\_{22} - \tilde{x}\_{12}^\* \tilde{x}\_{11}^{-1} \tilde{x}\_{12}] = \tilde{x}\_{11} \left[ \tilde{x}\_{22} - \frac{\tilde{x}\_{12} \tilde{x}\_{12}^\*}{\tilde{x}\_{11}} \right] \tag{i}$$

$$\det(I - \tilde{X}) = (1 - \tilde{x}\_{11}) \left[ 1 - \tilde{x}\_{22} - \frac{\tilde{x}\_{12} \tilde{x}\_{12}^\*}{1 - \tilde{x}\_{11}} \right]. \tag{ii}$$

From *(i)* and *(ii)*, it is seen that

$$\frac{\tilde{\mathfrak{x}}\_{12}\tilde{\mathfrak{x}}\_{12}^\*}{\tilde{\mathfrak{x}}\_{11}} \le \tilde{\mathfrak{x}}\_{22} \le 1 - \frac{\tilde{\mathfrak{x}}\_{12}\tilde{\mathfrak{x}}\_{12}^\*}{1 - \tilde{\mathfrak{x}}\_{11}}.$$

Note that when *X*˜ is Hermitian, *x*˜<sup>11</sup> and *x*˜<sup>22</sup> are real and hence we may not place a tilde on these variables. Let *y*˜ = *x*<sup>22</sup> − ˜*x*12*x*˜<sup>∗</sup> <sup>12</sup>*/x*11*.* Note that *y*˜ is also real since *x*˜12*x*˜<sup>∗</sup> <sup>12</sup> is real. As well, 0 ≤ *y* ≤ *b*, where

$$b = 1 - \left[\frac{\tilde{\boldsymbol{x}}\_{12}\tilde{\boldsymbol{x}}\_{12}^{\*}}{\boldsymbol{x}\_{11}}\right] - \frac{\tilde{\boldsymbol{x}}\_{12}\tilde{\boldsymbol{x}}\_{12}^{\*}}{1 - \boldsymbol{x}\_{11}} = 1 - \frac{\tilde{\boldsymbol{x}}\_{12}\tilde{\boldsymbol{x}}\_{12}^{\*}}{\boldsymbol{x}\_{11}(1 - \boldsymbol{x}\_{11})}.$$

Further, *<sup>b</sup>* is a real scalar of the form *<sup>b</sup>* <sup>=</sup> <sup>1</sup> − ˜*ww*˜ <sup>∗</sup> where *<sup>w</sup>*˜ <sup>=</sup> *<sup>x</sup>*˜<sup>12</sup> [*x*11*(*1−*x*11*)*] 1 2 ⇒ d*x*˜<sup>12</sup> = *x*11*(*1 − *x*11*)*d*w*˜ . This will make the exponents of *x*<sup>11</sup> and *(*1 − *x*11*)* as *α* − *p* + 1 = *α* − 1 and *β* − 1*,* respectively. Now, on integrating out *y,* we have

$$\int\_{\mathbf{y}=0}^{b} \mathbf{y}^{\alpha-2} (b-\mathbf{y})^{\beta-2} \mathbf{d}\mathbf{y} = b^{\alpha+\beta-3} \frac{\Gamma(\alpha-1)\Gamma(\beta-1)}{\Gamma(\alpha+\beta-2)}, \ \Re(\alpha) > 1, \ \Re(\beta) > 1. \tag{iii}$$

Integrating out *w*˜ , we have the following:

$$\int\_{\tilde{w}} (1 - \tilde{w}\,\tilde{w}^\*)^{\alpha + \beta - \Im} \mathrm{d}\tilde{w} = \pi \frac{\Gamma(\alpha + \beta - 2)}{\Gamma(\alpha + \beta - 1)}.\tag{\text{iv}}$$

This integral is evaluated by writing *z* = ˜*ww*˜ <sup>∗</sup>. Then, it follows from Theorem 4.2a.3 that

$$\mathrm{d}\tilde{w} = \frac{\pi^{p-1}}{\Gamma(p-1)} \mathrm{z}^{(p-1)-1} \mathrm{d}\mathfrak{z} = \pi \,\mathrm{d}\mathfrak{z} \text{ for } p=2.$$

Now, collecting all relevant factors from *(i)* to *(iv)*, the required representation of the initial integral, denoted by *δ*, is obtained:

$$\delta = \pi \frac{\Gamma(\alpha)\Gamma(\alpha - 1)\Gamma(\beta)\Gamma(\beta - 1)}{\Gamma(\alpha + \beta)\Gamma(\alpha + \beta - 1)} = \frac{\bar{\Gamma}\_2(\alpha)\bar{\Gamma}\_2(\beta)}{\tilde{\Gamma}\_2(\alpha + \beta)} = \tilde{B}\_2(\alpha + \beta)^2$$

whenever *(α) >* 1 and *(β) >* 1. This completes the computations.

#### **5.3.3. General partitions, real case**

In Sect. 5.3.2, we have considered integrating one variable at a time by suitably partitioning the matrices. Would it also be possible to have a general partitioning and integrate a block of variables at a time, rather than integrating out individual variables? We will consider the real matrix-variate gamma integral first. Let the *p*×*p* positive definite matrix *X* be partitioned as follows:

$$X = \begin{bmatrix} X\_{11} & X\_{12} \\ X\_{21} & X\_{22} \end{bmatrix}, \text{  $X\_{11}$  being  $p\_1 \times p\_1$  and  $X\_{22}$ ,  $p\_2 \times p\_2$ ,  $n$ }$$

so that *X*<sup>12</sup> is *p*<sup>1</sup> × *p*<sup>2</sup> with *X*<sup>21</sup> = *X*- <sup>12</sup> and *p*<sup>1</sup> + *p*<sup>2</sup> = *p*. Without any loss of generality, let us assume that *p*<sup>1</sup> ≥ *p*2. The determinant can be partitioned as follows:

$$\begin{split} |X|^{\alpha - \frac{p+1}{2}} &= |X\_{11}|^{\alpha - \frac{p+1}{2}} |X\_{22} - X\_{21} X\_{11}^{-1} X\_{12}|^{\alpha - \frac{p+1}{2}} \\ &= |X\_{11}|^{\alpha - \frac{p+1}{2}} |X\_{22}|^{\alpha - \frac{p+1}{2}} |I - X\_{22}^{-\frac{1}{2}} X\_{21} X\_{11}^{-1} X\_{12} X\_{22}^{-\frac{1}{2}}|^{\alpha - \frac{p+1}{2}}. \end{split}$$

Letting

$$Y = X\_{22}^{-\frac{1}{2}} X\_{21} X\_{11}^{-\frac{1}{2}} \Rightarrow \mathrm{d}Y = |X\_{22}|^{-\frac{p\_1}{2}} |X\_{11}|^{-\frac{p\_2}{2}} \mathrm{d}X\_{21}$$

for fixed *X*<sup>11</sup> and *X*<sup>22</sup> by making use of Theorem 1.6.4 of Chap. 1 or Theorem 1.18 of Mathai (1997),

$$|X|^{\alpha - \frac{p+1}{2}} \mathrm{d}X\_{21} = |X\_{11}|^{\alpha + \frac{p\_2}{2} - \frac{p+1}{2}} |X\_{22}|^{\alpha + \frac{p\_1}{2} - \frac{p+1}{2}} |I - YY'|^{\alpha - \frac{p+1}{2}} \mathrm{d}Y.$$

Letting *S* = *Y Y* and integrating out over the Stiefel manifold, we have

$$\mathrm{d}Y = \frac{\pi^{\frac{p\_1 p\_2}{2}}}{\Gamma\_{p\_2}(\frac{p\_1}{2})} |S|^{\frac{p\_1}{2} - \frac{p\_2 + 1}{2}} \mathrm{d}S;$$

refer to Theorem 2.16 and Remark 2.13 of Mathai (1997) or Theorem 4.2.3 of Chap. 4. Now, the integral over *S* gives

$$\int\_{O$$

for *(α) > <sup>p</sup>*1−<sup>1</sup> <sup>2</sup> . Collecting all the factors, we have

$$|X\_{11}|^{
\alpha - \frac{p\_1+1}{2}} |X\_{22}|^{
\alpha - \frac{p\_2+1}{2}} \pi^{\frac{p\_1 p\_2}{2}} \frac{\Gamma\_{p\_2}(\alpha - \frac{p\_1}{2})}{\Gamma\_{p\_2}(\alpha)}.$$

One can observe from this result that the original determinant splits into functions of *X*<sup>11</sup> and *X*22. This also shows that if we are considering a real matrix-variate gamma density, then the diagonal blocks *X*<sup>11</sup> and *X*<sup>22</sup> are statistically independently distributed, where *X*<sup>11</sup> will have a *p*1-variate gamma distribution and *X*22, a *p*2-variate gamma distribution. Note that tr*(X)* = tr*(X*11*)* + tr*(X*22*)* and hence, the integral over *X*<sup>22</sup> gives *Γp*<sup>2</sup> *(α)* and the integral over *X*11, *Γp*<sup>1</sup> *(α)*. Thus, the total integral is available as

$$\Gamma\_{p\_1}(\alpha)\Gamma\_{p\_2}(\alpha)\pi^{\frac{p\_1p\_2}{2}}\frac{\Gamma\_{p\_2}(\alpha-\frac{p\_1}{2})}{\Gamma\_{p\_2}(\alpha)} = \Gamma\_p(\alpha).$$

since *<sup>π</sup> <sup>p</sup>*1*p*<sup>2</sup> <sup>2</sup> *Γp*<sup>1</sup> *(α)Γp*<sup>2</sup> *(α* <sup>−</sup> *<sup>p</sup>*<sup>1</sup> <sup>2</sup> *)* = *Γp(α)*.

Hence, it is seen that instead of integrating out variables one at a time, we could have also integrated out blocks of variables at a time and verified the result. A similar procedure works for real matrix-variate type-1 and type-2 beta distributions, as well as the matrixvariate gamma and type-1 and type-2 beta distributions in the complex domain.

#### **5.3.4. Methods avoiding integration over the Stiefel manifold**

The general method of partitioning matrices previously described involves the integration over the Stiefel manifold as an intermediate step and relies on Theorem 4.2.3. We will consider another procedure whereby integration over the Stiefel manifold is not required. Let us consider the real gamma case first. Again, we begin with the decomposition

$$|X|^{
\alpha - \frac{p+1}{2}} = |X\_{11}|^{
\alpha - \frac{p+1}{2}} |X\_{22} - X\_{21}X\_{11}^{-1}X\_{12}|^{
\alpha - \frac{p+1}{2}}.\tag{5.3.8}$$

Instead of integrating out *X*<sup>21</sup> or *X*12*,* let us integrate out *X*22. Let *X*<sup>11</sup> be a *p*<sup>1</sup> ×*p*<sup>1</sup> matrix and *X*<sup>22</sup> be a *p*<sup>2</sup> × *p*<sup>2</sup> matrix, with *p*<sup>1</sup> + *p*<sup>2</sup> = *p*. In the above partitioning, we require that *X*<sup>11</sup> be nonsingular. However, when *X* is positive definite, both *X*<sup>11</sup> and *X*<sup>22</sup> will be positive definite, and thereby nonsingular. From the second factor in (5.3.8), *X*<sup>22</sup> *>* *X*21*X*−<sup>1</sup> <sup>11</sup> *<sup>X</sup>*<sup>12</sup> as *<sup>X</sup>*<sup>22</sup> <sup>−</sup> *<sup>X</sup>*21*X*−<sup>1</sup> <sup>11</sup> *X*<sup>12</sup> is positive definite. We will attempt to integrate out *<sup>X</sup>*<sup>22</sup> first. Let *<sup>U</sup>* <sup>=</sup> *<sup>X</sup>*<sup>22</sup> <sup>−</sup> *<sup>X</sup>*21*X*−<sup>1</sup> <sup>11</sup> *X*<sup>12</sup> so that d*U* = d*X*<sup>22</sup> for fixed *X*<sup>11</sup> and *X*12. Since tr*(X)* = tr*(X*11*)* + tr*(X*22*),* we have

$$\mathbf{e}^{-\text{tr}(X\_{22})} = \mathbf{e}^{-\text{tr}(U) - \text{tr}(X\_{21}X\_{11}^{-1}X\_{12})}.$$

On integrating out *U,* we obtain

$$\int\_{U>0} |U|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(U)} \mathbf{d}U = \Gamma\_{p\_2}(\alpha - \frac{p\_1}{2}), \; \Re(\alpha) > \frac{p-1}{2}$$

since *<sup>α</sup>* <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> <sup>=</sup> *<sup>α</sup>* <sup>−</sup> *<sup>p</sup>*<sup>1</sup> <sup>2</sup> <sup>−</sup> *<sup>p</sup>*2+<sup>1</sup> <sup>2</sup> . Letting

$$Y = X\_{21} X\_{11}^{-\frac{1}{2}} \Rightarrow \mathrm{d}Y = |X\_{11}|^{-\frac{p\_2}{2}} \mathrm{d}X\_{21}$$

for fixed *X*<sup>11</sup> (Theorem 1.6.1), we have

$$\int\_{X\_{21}} \mathbf{e}^{-\text{tr}(X\_{21}X\_{11}^{-1}X\_{12})} \mathbf{d}X\_{21} = |X\_{11}|^{\frac{p\_2}{2}} \int\_Y \mathbf{e}^{-\text{tr}(YY')} \mathbf{d}Y.$$

But tr*(Y Y* - *)* is the sum of the squares of the *p*1*p*<sup>2</sup> elements of *Y* and each integral is of the form # <sup>∞</sup> −∞ <sup>e</sup>−*z*<sup>2</sup> <sup>d</sup>*<sup>z</sup>* <sup>=</sup> <sup>√</sup>*π*. Hence,

$$\int\_{Y} \mathbf{e}^{-\text{tr}(YY')} \mathbf{d}Y = \pi^{\frac{p\_1 p\_2}{2}}.$$

We may now integrate out *X*11:

$$\begin{aligned} \int\_{X\_{11} > O} |X\_{11}|^{\alpha + \frac{p\_2}{2} - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(X\_{11})} \mathbf{d}X\_{11} \\ &= \int\_{X\_{11} > O} |X\_{11}|^{\alpha - \frac{p\_1+1}{2}} \mathbf{e}^{-\text{tr}(X\_{11})} \mathbf{d}X\_{11} \\ &= \varGamma\_{p\_1}(\alpha) . \end{aligned}$$

Thus, we have the following factors:

$$
\pi^{\frac{p\_1 p\_2}{2}} \Gamma\_{p\_2}(\alpha - p\_1/2) \Gamma\_{p\_1}(\alpha) = \Gamma\_p(\alpha),
$$

since

$$\frac{p\_1(p\_1 - 1)}{4} + \frac{p\_2(p\_2 - 1)}{4} + \frac{p\_1 p\_2}{2} = \frac{p(p - 1)}{4}, \ p = p\_1 + p\_2,$$

and

$$\begin{aligned} \Gamma\_{p\_1}(\alpha)\Gamma\_{p\_2}(\alpha - p\_1/2) &= \Gamma(\alpha)\Gamma(\alpha - 1/2) \cdot \cdots \Gamma(\alpha - (p\_1 - 1)/2)\Gamma\_{p\_2}(\alpha - (p\_1)/2) \\ &= \Gamma(\alpha) \cdot \cdots \Gamma(\alpha - (p\_1 + p\_2 - 1)/2). \end{aligned}$$

Hence the result. This procedure avoids integration over the Stiefel manifold and does not require that *p*<sup>1</sup> ≥ *p*2. We could have integrated out *X*<sup>11</sup> first, if needed. In that case, we would have used the following expansion:

$$|X|^{
\alpha - \frac{p+1}{2}} = |X\_{22}|^{
\alpha - \frac{p+1}{2}} |X\_{11} - X\_{12}X\_{22}^{-1}X\_{21}|^{
\alpha - \frac{p+1}{2}}.$$

We would have then proceeded as before by integrating out *X*<sup>11</sup> first and would have ended up with

$$
\pi^{\frac{p\_1 p\_2}{2}} \Gamma\_{p\_1}(\alpha - p\_2/2) \Gamma\_{p\_2}(\alpha) = \Gamma\_p(\alpha), \ p = p\_1 + p\_2.
$$

**Note 5.3.1:** If we are considering a real matrix-variate gamma density, such as the Wishart density, then from the above procedure, observe that after integrating out *X*22*,* the only factor containing *X*<sup>21</sup> is the exponential function, which has the structure of a matrix-variate Gaussian density. Hence, for a given *X*11, *X*<sup>21</sup> is matrix-variate Gaussian distributed. Similarly, for a given *X*22, *X*<sup>12</sup> is matrix-variate Gaussian distributed. Further, the diagonal blocks *X*<sup>11</sup> and *X*<sup>22</sup> are independently distributed.

The same procedure also applies for the evaluation of the gamma integrals in the complex domain. Since the steps are parallel, they will not be detailed here.

#### **5.3.5. Arbitrary moments of the determinants, real gamma and beta matrices**

Let the *p* × *p* real positive definite matrix *X* have a real matrix-variate gamma density with the parameters *(α, B > O)*. Then for an arbitrary *h,* we can evaluate the *h*-th moment of the determinant of *X* with the help of the matrix-variate gamma integral, namely,

$$\int\_{X>O} |X|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(BX)} \mathbf{d}X = |B|^{-\alpha} \Gamma\_p(\alpha). \tag{i}$$

By making use of (i), we can evaluate the *h*-th moment in a real matrix-variate gamma density with the parameters *(α, B > O)* by considering the associated normalizing constant. Let *u*<sup>1</sup> = |*X*|. Then, the moments of *u*<sup>1</sup> can be obtained by integrating out over the density of *X*:

$$\begin{split} E[\boldsymbol{u}\_{1}]^{h} &= \frac{|\boldsymbol{B}|^{\boldsymbol{\alpha}}}{\Gamma\_{p}(\boldsymbol{\alpha})} \int\_{X>O} \boldsymbol{u}\_{1}^{h} |\boldsymbol{X}|^{\boldsymbol{\alpha}-\frac{p+1}{2}} \mathbf{e}^{-\text{tr}(\boldsymbol{B}\boldsymbol{X})} \mathbf{d}X \\ &= \frac{|\boldsymbol{B}|^{\boldsymbol{\alpha}}}{\Gamma\_{p}(\boldsymbol{\alpha})} \int\_{X>O} |\boldsymbol{X}|^{\boldsymbol{\alpha}+h-\frac{p+1}{2}} \mathbf{e}^{-\text{tr}(\boldsymbol{B}\boldsymbol{X})} \mathbf{d}X \\ &= \frac{|\boldsymbol{B}|^{\boldsymbol{\alpha}}}{\Gamma\_{p}(\boldsymbol{\alpha})} \Gamma\_{p}(\boldsymbol{\alpha}+h) |\boldsymbol{B}|^{-(\boldsymbol{\alpha}+h)}, \ \boldsymbol{\mathfrak{N}}(\boldsymbol{\alpha}+h) > \frac{p-1}{2}. \end{split}$$

Thus,

$$E[\mu\_1]^h = |B|^{-h} \frac{\Gamma\_p(\alpha + h)}{\Gamma\_p(\alpha)}, \ \Re(\alpha + h) > \frac{p-1}{2}.$$

This is evaluated by observing that when *E*[*u*1] *<sup>h</sup>* is taken, *<sup>α</sup>* is replaced by *<sup>α</sup>* <sup>+</sup> *<sup>h</sup>* in the integrand and hence, the answer is obtained from equation *(i)*. The same procedure enables one to evaluate the *h*-th moment of the determinants of type-1 beta and type-2 beta matrices. Let *Y* be a *p* × *p* real positive definite matrix having a real matrix-variate type-1 beta density with the parameters *(α, β)* and *u*<sup>2</sup> = |*Y* |. Then, the *h*-th moment of *Y* is obtained as follows:

$$\begin{split} &E[u\_{2}]^{h} = \frac{\Gamma\_{p}(\alpha+\beta)}{\Gamma\_{p}(\alpha)\Gamma\_{p}(\beta)} \int\_{O \frac{p-1}{2}, \\ &= \frac{\Gamma\_{p}(\alpha+h)}{\Gamma\_{p}(\alpha)} \frac{\Gamma\_{p}(\alpha+\beta)}{\Gamma\_{p}(\alpha+\beta+h)}, \; \mathfrak{R}(\alpha+h) > \frac{p-1}{2}. \end{split}$$

In a similar manner, let *u*<sup>3</sup> = |*Z*| where *Z* has a *p* × *p* real matrix-variate type-2 beta density with the parameters *(α, β)*. In this case, take *α* +*β* = *(α* +*h)*+*(β* −*h),* replacing *α* by *α* + *h* and *β* by *β* − *h*. Then, considering the normalizing constant of a real matrixvariate type-2 beta density, we obtain the *h*-th moment of *u*<sup>3</sup> as follows:

$$E[u\_3]^h = \frac{\Gamma\_p(\alpha + h)}{\Gamma\_p(\alpha)} \frac{\Gamma\_p(\beta - h)}{\Gamma\_p(\beta)}, \ \Re(\alpha + h) > \frac{p - 1}{2}, \ \Re(\beta - h) > \frac{p - 1}{2}.$$

Relatively few moments will exist in this case, as *(α* <sup>+</sup> *h) > <sup>p</sup>*−<sup>1</sup> implies that *(h) >* −*(α)* <sup>+</sup> *<sup>p</sup>*−<sup>1</sup> and *(β* <sup>−</sup> *h) > <sup>p</sup>*−<sup>1</sup> means that *(h) < (β)* <sup>−</sup> *<sup>p</sup>*−<sup>1</sup> . Accordingly, only moments in the range −*(α)*<sup>+</sup> *<sup>p</sup>*−<sup>1</sup> *<sup>&</sup>lt; (h) < (β)*<sup>−</sup> *<sup>p</sup>*−<sup>1</sup> will exist. We can summarize the above results as follows: When *X* is distributed as a real *p* × *p* matrix-variate gamma with the parameters *(α, B > O)*,

$$|E|X|^h = \frac{|B|^{-h}\Gamma\_p(\alpha+h)}{\Gamma\_p(\alpha)},\ \Re(\alpha) > \frac{p-1}{2}.\tag{5.3.9}$$

When *Y* has a *p*×*p* real matrix-variate type-1 beta density with the parameters *(α, β)* and if *u*<sup>2</sup> = |*Y* | then

$$E[\boldsymbol{u}\_2]^h = \frac{\Gamma\_p(\alpha + h)}{\Gamma\_p(\alpha)} \frac{\Gamma\_p(\alpha + \beta)}{\Gamma\_p(\alpha + \beta + h)}, \ \Re(\alpha + h) > \frac{p - 1}{2}. \tag{5.3.10}$$

When the *p*×*p* real positive definite matrix *Z* has a real matrix-variate type-2 beta density with the parameters *(α, β)*, then letting *u*<sup>3</sup> = |*Z*|*,*

$$E[\mu\_3]^h = \frac{\Gamma\_p(\alpha + h)}{\Gamma\_p(\alpha)} \frac{\Gamma\_p(\beta - h)}{\Gamma\_p(\beta)}, \quad -\Re(\alpha) + \frac{p - 1}{2} < \Re(h) < \Re(\beta) - \frac{p - 1}{2}.\tag{5.3.11}$$

Let us examine (5.3.9):

$$\begin{aligned} E|X|^h &= |B|^{-h} \frac{\Gamma\_p(\alpha+h)}{\Gamma\_p(\alpha)} \\ &= |B|^{-h} \frac{\Gamma(\alpha+h)}{\Gamma(\alpha)} \frac{\Gamma(\alpha - \frac{1}{2} + h)}{\Gamma(\alpha - \frac{1}{2})} \dots \frac{\Gamma(\alpha - \frac{p-1}{2} + h)}{\Gamma(\alpha - \frac{p-1}{2})} \\ &= E[\mathbf{x}\_1^h] E[\mathbf{x}\_2^h] \dotsm E[\mathbf{x}\_p^h] \end{aligned}$$

where *xj* is a real scalar gamma random variable with shape parameter *<sup>α</sup>* <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> and scale parameter *λj* where *λj >* 0*, j* = 1*,...,p* are the eigenvalues of *B>O* by observing that the determinant is the product and trace is the sum of the eigenvalues *λ*1*,...,λp*. Further, *x*1*, .., xp,* are independently distributed. Hence, structurally, we have the following representation:

$$|X| = \prod\_{j=1}^{p} x\_j \tag{5.3.12}$$

where *xj* has the density

$$f\_{\mathbf{l}\_j}(\mathbf{x}\_j) = \frac{\lambda\_j^{\alpha - \frac{j-1}{2}}}{\Gamma(\alpha - \frac{j-1}{2})} \mathbf{x}\_j^{\alpha - \frac{j-1}{2} - 1} \mathbf{e}^{-\lambda\_j \mathbf{x}\_j}, \ \ 0 \le \mathbf{x}\_j < \infty, \ \mathbf{x}\_j$$

for *(α) > <sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, λj >* 0 and zero otherwise. Similarly, when the *p* × *p* real positive definite matrix *Y* has a real matrix-variate type-1 beta density with the parameters *(α, β)*, the determinant, |*Y* |, has the structural representation

$$|Y| = \prod\_{j=1}^{p} \mathbf{y}\_j \tag{5.3.13}$$

where *yj* is a real scalar type-1 beta random variable with the parameter *(α* <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, β)* for *j* = 1*,...,p*. When the *p* × *p* real positive definite matrix *Z* has a real matrixvariate type-2 beta density, then |*Z*|, the determinant of *Z*, has the following structural representation:

$$|Z| = \prod\_{j=1}^{p} z\_j \tag{5.3.14}$$

where *zj* has a real scalar type-2 beta density with the parameters *(α* <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, β* <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *)* for *j* = 1*,...,p*.

**Example 5.3.2.** Consider a real 2 × 2 matrix *X* having a real matrix-variate distribution. Derive the density of the determinant |*X*| if *X* has (a) a gamma distribution with the parameters *(α, B* <sup>=</sup> *I )*; (b) a real type-1 beta distribution with the parameters *(α* <sup>=</sup> <sup>3</sup> <sup>2</sup> *, β* <sup>=</sup> <sup>3</sup> 2 *)*; (c) a real type-2 beta distribution with the parameters *(α* <sup>=</sup> <sup>3</sup> <sup>2</sup> *, β* <sup>=</sup> <sup>3</sup> 2 *)*.

**Solution 5.3.2.** We will derive the density in these three cases by using three different methods to illustrate the possibility of making use of various approaches for solving such problems. (a) Let *u*<sup>1</sup> = |*X*| in the gamma case. Then for an arbitrary *h*,

$$E[u\_1^h] = \frac{\Gamma\_p(\alpha + h)}{\Gamma\_p(\alpha)} = \frac{\Gamma(\alpha + h)\Gamma(\alpha + h - \frac{1}{2})}{\Gamma(\alpha)\Gamma(\alpha - \frac{1}{2})}, \; \Re(\alpha) > \frac{1}{2}.$$

Since the gammas differ by <sup>1</sup> <sup>2</sup> , they can be combined by utilizing the following identity:

$$\Gamma(mz) = (2\pi)^{\frac{1-m}{2}} m^{mz-\frac{1}{2}} \Gamma(z) \Gamma(z+\frac{1}{m}) \cdots \Gamma(z+\frac{m-1}{m}), \ m = 1,2,\ldots, \quad (5.3.15)$$

which is the multiplication formula for gamma functions. For *m* = 2*,* we have the duplication formula:

$$
\Gamma(2z) = (2\pi)^{-\frac{1}{2}} 2^{2z - \frac{1}{2}} \Gamma(z) \Gamma(z + 1/2).
$$

Thus,

$$
\Gamma(z)\Gamma(z+1/2) = \pi^{\frac{1}{2}}2^{1-2z}\Gamma(2z).
$$

#### Matrix-Variate Gamma and Beta Distributions

Now, by taking *<sup>z</sup>* <sup>=</sup> *<sup>α</sup>* <sup>−</sup> <sup>1</sup> <sup>2</sup> <sup>+</sup> *<sup>h</sup>* in the numerator and *<sup>z</sup>* <sup>=</sup> *<sup>α</sup>* <sup>−</sup> <sup>1</sup> <sup>2</sup> in the denominator, we can write

$$E[u\_1^h] = \frac{\Gamma(\alpha + h)\Gamma(\alpha + h - \frac{1}{2})}{\Gamma(\alpha)\Gamma(\alpha - \frac{1}{2})} = \frac{\Gamma(2\alpha - 1 + 2h)}{\Gamma(2\alpha - 1)}2^{-2h}.$$

Accordingly,

$$E[(4u\_1)]^h = \frac{\Gamma(2\alpha - 1 + 2h)}{\Gamma(2\alpha - 1)} \Rightarrow E[2u\_1^{\frac{1}{2}}]^{2h} = \frac{\Gamma(2\alpha - 1 + 2h)}{\Gamma(2\alpha - 1)}.$$

This shows that *v* = 2*u* 1 2 <sup>1</sup> has a real scalar gamma distribution with the parameters *(*2*α* − 1*,* 1*)* whose density is

$$\begin{split} f(v) \, \mathrm{d}v &= \frac{v^{(2\alpha -1) - 1}}{\Gamma(2\alpha - 1)} \mathrm{e}^{-v} \mathrm{d}v = \frac{(2u\_1^{\frac{1}{2}})^{2\alpha - 2}}{\Gamma(2\alpha - 1)} \mathrm{e}^{-2u\_1^{\frac{1}{2}}} \mathrm{d}(2u\_1^{\frac{1}{2}}) \\ &= \frac{2^{2\alpha - 2} u\_1^{\alpha - \frac{1}{2} - 1}}{\Gamma(2\alpha - 1)} \mathrm{e}^{-2u\_1^{\frac{1}{2}}} \mathrm{d}u\_1. \end{split}$$

Hence the density of *u*1, denoted by *f*1*(u*1*)*, is the following:

$$f\_1(\mu\_1) = \frac{2^{2\alpha - 2} \mu^{\alpha - \frac{1}{2} - 1}}{\Gamma(2\alpha - 1)} \mathbf{e}^{-2\mu^{\frac{1}{2}}}, \ 0 \le \mu\_1 < \infty$$

and zero elsewhere. It can easily be verified that *f*1*(u*1*)* is a density. (b) Let *<sup>u</sup>*<sup>2</sup> = |*X*|. Then for an arbitrary *<sup>h</sup>*, *<sup>α</sup>* <sup>=</sup> <sup>3</sup> <sup>2</sup> and *<sup>β</sup>* <sup>=</sup> <sup>3</sup> 2 ,

$$\begin{split} E[u\_2^h] &= \frac{\Gamma\_p(\alpha+h)}{\Gamma\_p(\alpha)} \frac{\Gamma\_p(\alpha+\beta)}{\Gamma\_p(\alpha+\beta+h)} \\ &= \frac{\Gamma(3)\Gamma(\frac{5}{2})}{\Gamma(\frac{3}{2})\Gamma(\frac{2}{2})} \frac{\Gamma(\frac{3}{2}+h)\Gamma(1+h)}{\Gamma(3+h)\Gamma(\frac{5}{2}+h)} \\ &= 3 \left\{ \frac{1}{(2+h)(1+h)(\frac{3}{2}+h)} \right\} = 3 \left\{ \frac{2}{2+h} + \frac{2}{1+h} - \frac{4}{\frac{3}{2}+h} \right\}, \end{split}$$

the last expression resulting from an application of the partial fraction technique. This results from *h*-th moment of the distribution of *u*2, whose density which is

$$f\_2(\mu\_2) = 6\{1 + \mu\_2 - 2\mu\_2^{\frac{1}{2}}\},\ 0 \le \mu\_2 \le 1,$$

and zero elsewhere, is readily seen to be bona fide. (c) Let the density *u*<sup>3</sup> = |*X*| be denoted by *f*3*(u*3*)*. The Mellin transform of *f*3*(u*3*)*, with Mellin parameter *s*, is

$$\begin{split} E[u\_3^{s-1}] &= \frac{\Gamma\_p(\alpha+s-1)}{\Gamma\_p(\alpha)} \frac{\Gamma\_p(\beta-s+1)}{\Gamma\_p(\beta)} = \frac{\Gamma\_2(\frac{3}{2}+s-1)}{\Gamma\_2(\frac{3}{2})} \frac{\Gamma\_2(\frac{3}{2}-s+1)}{\Gamma\_2(\frac{3}{2})} \\ &= \frac{1}{[\Gamma(3/2)\Gamma(1)]^2} \Gamma(1/2+s)\Gamma(s)\Gamma(5/2-s)\Gamma(2-s), \end{split}$$

the corresponding density being available by taking the inverse Mellin transform, namely,

$$f\_{\Im}(\mu\_3) = \frac{4}{\pi} \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} \Gamma(\mathbf{s}) \Gamma(\mathbf{s} + 1/2) \Gamma(\mathbf{5}/2 - \mathbf{s}) \Gamma(2 - \mathbf{s}) u\_{\Im}^{-s} \mathrm{d}\mathbf{s} \tag{i}$$

where *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)* and *<sup>c</sup>* in the integration contour is such that 0 *<c<* 2. The integral in *(i)* is available as the sum of residues at the poles of *Γ (s)Γ (s* <sup>+</sup> <sup>1</sup> <sup>2</sup> *)* for 0 ≤ *u*<sup>3</sup> ≤ 1 and the sum of residues at the poles of *Γ (*<sup>2</sup> <sup>−</sup> *s)Γ (* <sup>5</sup> <sup>2</sup> − *s)* for 1 *< u*<sup>3</sup> *<* ∞*.* We can also combine *Γ (s)* and *Γ (s* <sup>+</sup> <sup>1</sup> <sup>2</sup> *)* as well as *Γ (*<sup>2</sup> <sup>−</sup> *s)* and *Γ (* <sup>5</sup> <sup>2</sup> − *s)* by making use of the duplication formula for gamma functions. We will then be able to identify the functions in each of the sectors, 0 ≤ *u*<sup>3</sup> ≤ 1 and 1 *< u*<sup>3</sup> *<* ∞. These will be functions of *u* 1 2 <sup>3</sup> as done in the case *(a)*. In order to illustrate the method relying on the inverse Mellin transform, we will evaluate the density *<sup>f</sup>*3*(u*3*)* as a sum of residues. The poles of *Γ (s)Γ (s* <sup>+</sup> <sup>1</sup> <sup>2</sup> *)* are simple and hence two sums of residues are obtained for 0 ≤ *u*<sup>3</sup> ≤ 1. The poles of *Γ (s)* occur at *<sup>s</sup>* = −*ν, ν* <sup>=</sup> <sup>0</sup>*,* <sup>1</sup>*,...,* and those of *Γ (s* <sup>+</sup> <sup>1</sup> <sup>2</sup> *)* occur at *<sup>s</sup>* = −<sup>1</sup> <sup>2</sup> − *ν, ν* = 0*,* 1*,...*. The residues and the sum thereof will be evaluated with the help of the following two lemmas.

**Lemma 5.3.1.** *Consider a function Γ (γ* <sup>+</sup> *s)φ(s)u*−*<sup>s</sup> whose poles are simple. The residue at the pole s* = −*γ* − *ν, ν* = 0*,* 1*,..., denoted by Rν, is given by*

$$R\_{\nu} = \frac{(-1)^{\nu}}{\nu!} \phi(-\nu - \nu) u^{\nu + \nu}.$$

**Lemma 5.3.2.** *When Γ (δ) and Γ (δ* − *ν) are defined*

$$F(\delta - \nu) = \frac{(-1)^{\nu} F(\delta)}{(-\delta + 1)\_{\nu}}$$

*where, for example, (a)ν* = *a(a* +1*)*···*(a* +*ν* −1*), (a)*<sup>0</sup> = 1*, a* = 0*, is the Pochhammer symbol.*

#### Matrix-Variate Gamma and Beta Distributions

Observe that *Γ (α)* is defined for all *α* = 0*,* −1*,* −2*,...,* and that an integral representation requires *(α) >* 0. As well, *Γ (α* + *k)* = *Γ (α)(α)k, k* = 1*,* 2*,....*. With the help of Lemmas 5.3.1 and 5.3.2, the sum of the residues at the poles of *Γ (s)* in the integral in *(i)*, excluding the constant <sup>4</sup> *<sup>π</sup>* , is the following:

$$\begin{split} &\sum\_{\nu=0}^{\infty} \frac{(-1)^{\nu}}{\nu!} \Gamma\left(\frac{1}{2}-\nu\right) \Gamma\left(\frac{\mathfrak{L}}{2}+\nu\right) \Gamma(2+\nu) \\ &= \sum\_{\nu=0}^{\infty} \frac{(-1)^{\nu}}{\nu!} \Gamma\left(\frac{1}{2}\right) \Gamma\left(\frac{\mathfrak{L}}{2}\right) \Gamma(2) \frac{(-1)^{\nu}}{(\frac{1}{2})\_{\nu}} \left(\frac{\mathfrak{L}}{2}\right)\_{\nu} (2)\_{\nu} \mu\_3^{\nu} \\ &= \frac{3}{4} \pi \, \,\_2F\_1\left(\frac{\mathfrak{L}}{2}, 2; \frac{1}{2}; \mu\_3\right), \ 0 \le \mu\_3 \le 1, \end{split}$$

where the <sup>2</sup>*F*1*(*·*)* is Gauss' hypergeometric function. The same procedure consisting of taking the sum of the residues at the poles *<sup>s</sup>* = −<sup>1</sup> <sup>2</sup> − *ν, ν* = 0*,* 1*,...,* gives

$$-3\pi \mu\_3^{\frac{1}{2}} \,\_2F\_1\left(3, \frac{5}{2}; \frac{3}{2}; \mu\_3\right), \ 0 \le \mu\_3 \le 1.$$

The inverse Mellin transform for the sector 1 *< u*<sup>3</sup> *<* ∞ is available as the sum of residues at the poles of *Γ (* <sup>5</sup> <sup>2</sup> <sup>−</sup> *s)* and *Γ (*<sup>2</sup> <sup>−</sup> *s)* which occur at *<sup>s</sup>* <sup>=</sup> <sup>5</sup> <sup>2</sup> + *ν* and *s* = 2 + *ν* for *<sup>ν</sup>* <sup>=</sup> <sup>0</sup>*,* <sup>1</sup>*,...* . The sum of residues at the poles of *Γ (* <sup>5</sup> <sup>2</sup> − *s)* is the following:

$$\begin{aligned} &\sum\_{\nu=0}^{\infty} \frac{(-1)^{\nu}}{\nu!} \Gamma\left(\frac{\mathfrak{S}}{2} + \nu\right) \Gamma(3+\nu) \Gamma\left(-\frac{1}{2} - \nu\right) \mu\_3^{-\frac{\mathfrak{s}}{2}-\nu} \\ &= -3\pi \mu\_3^{-\frac{\mathfrak{s}}{2}} \,\_2F\_1\left(\frac{\mathfrak{S}}{2}, 3; \frac{\mathfrak{s}}{2}; \frac{1}{\mu\_3}\right), \; 1 < \mu\_3 < \infty, \end{aligned}$$

and the sum of the residues at the poles of *Γ (*2 − *s)* is given by

$$\frac{3}{4}\pi\mu\_3^{-2}\,\_2F\_1\left(2,\frac{5}{2};\frac{1}{2};\frac{1}{\mu\_3}\right),\ 1<\mu\_3<\infty.$$

Now, on combining all the hypergeometric series and multiplying the result by the constant 4 *<sup>π</sup> ,* the final representation of the required density is obtained as

$$f\_3(\mu\_3) = \begin{cases} 3\, \_2F\_1(\frac{5}{2}, 2; \frac{1}{2}; \mu\_3) - 12u\_3^{\frac{1}{2}}F\_1(3, \frac{5}{2}; \frac{3}{2}; \mu\_3), & 0 \le \mu\_3 \le 1, \\ 3u\_3^{-2} \, \_2F\_1(2, \frac{5}{2}; \frac{1}{2}; \frac{1}{\mu\_3}) - 12u\_3^{-\frac{5}{2}}F\_1(\frac{5}{2}, 3; \frac{3}{2}; \frac{1}{\mu\_3}), & 1 < \mu\_3 < \infty. \end{cases}$$

This completes the computations.

#### **5.3a.2. Arbitrary moments of the determinants in the complex case**

In the complex matrix-variate case, one can consider the absolute value of the determinant, which will be real; however, the parameters will be different from those in the real case. For example, consider the complex matrix-variate gamma density. If *X*˜ has a *p* × *p* complex matrix-variate gamma density with the parameters *(α, B > O)* ˜ , then the *h*-th moment of the absolute value of the determinant of *X*˜ is the following:

$$\begin{aligned} \left[E(|\det(\tilde{X})|)\right]^h &= \frac{|\det(\tilde{B})|^{-h}\tilde{\Gamma}\_p(\alpha+h)}{\tilde{\Gamma}\_p(\alpha)}\\ &= (\lambda\_1 \cdots \lambda\_p)^{-h} \prod\_{j=1}^p \frac{\Gamma(\alpha - (j-1) + h)}{\Gamma(\alpha - (j-1))} = \prod\_{j=1}^p E[\tilde{x}\_j]^h, \end{aligned}$$

that is, | det*(X)*˜ | has the structural representation

$$\left| \det(X) \right| = \tilde{x}\_1 \tilde{x}\_2 \cdots \tilde{x}\_p,\tag{5.3a.8}$$

where the *x*˜*<sup>j</sup>* is a *real scalar gamma random variable* with the parameters *(α*−*(j*−1*), λj )*, *j* = 1*,... , p,* and the *x*˜*<sup>j</sup>* 's are independently distributed. Similarly, when *Y*˜ is a *p* × *p* complex Hermitian positive definite matrix having a complex matrix-variate type-1 beta density with the parameters *(α, β)*, the absolute value of the determinant of *Y*˜, |det*(Y )*˜ |, has the structural representation

$$|\det(\tilde{Y})| = \prod\_{j=1}^{p} \tilde{\chi}\_{j} \tag{5.3a.9}$$

where the *y*˜*<sup>j</sup>* 's are independently distributed, *y*˜*<sup>j</sup>* being *a real scalar type-1 beta random variable* with the parameters *(α* − *(j* − 1*), β)*, *j* = 1*,...,p*. When *Z*˜ is a *p* × *p* Hermitian positive definite matrix having a complex matrix-variate type-2 beta density with the parameters *(α, β)*, then for arbitrary *h*, the *h*-th moment of the absolute value of the determinant is given by

$$\begin{split} E[|\det(\tilde{Z})|^{h} = \frac{\tilde{\Gamma}\_{p}(\alpha+h)}{\tilde{\Gamma}\_{p}(\alpha)} \frac{\tilde{\Gamma}\_{p}(\beta-h)}{\tilde{\Gamma}\_{p}(\beta)} \\ &= \left\{ \prod\_{j=1}^{p} \frac{\Gamma(\alpha - (j-1) + h)}{\Gamma(\alpha - (j-1))} \right\} \Big\{ \prod\_{j=1}^{p} \frac{\Gamma(\beta - (j-1) - h)}{\Gamma(\beta - (j-1))} \Big\} \\ &= \prod\_{j=1}^{p} E[\tilde{z}\_{j}]^{h}, \end{split}$$

so that the absolute value of the determinant of *Z*˜ has the following structural representation:

$$|\det(\tilde{Z})| = \prod\_{j=1}^{p} \tilde{z}\_j \tag{5.3a.10}$$

where the *z*˜*<sup>j</sup>* 's are independently distributed *real scalar type-2 beta random variables* with the parameters *(α* − *(j* − 1*), β* − *(j* − 1*))* for *j* = 1*,...,p*. Thus, in the real case, the determinant and, in the complex case, the absolute value of the determinant have structural representations in terms of products of independently distributed real scalar random variables. The following is the summary of what has been discussed so far:


for *j* = 1*,...,p*. When we consider the determinant in the real case, the parameters differ by <sup>1</sup> <sup>2</sup> whereas the parameters differ by 1 in the complex domain. Whether in the real or complex cases, the individual variables appearing in the structural representations are real scalar variables that are independently distributed.

**Example 5.3a.2.** Even when *p* = 2, some of the poles will be of order 2 since the gammas differ by integers in the complex case, and hence a numerical example will not be provided for such an instance. Actually, when poles of order 2 or more are present, the series representation will contain logarithms as well as psi and zeta functions. A simple illustrative example is now considered. Let *X*˜ be 2 × 2 matrix having a complex matrixvariate type-1 beta distribution with the parameters *(α* = 2*, β* = 2*)*. Evaluate the density of *u*˜ = |det*(X)*˜ |.

**Solution 5.3a.2.** Let us take the *(s* − 1*)*th moment of *u*˜ which corresponds to the Mellin transform of the density of *u*˜, with Mellin parameter *s*:

$$\begin{split} E[\tilde{\mu}^{s-1}] &= \frac{\bar{\Gamma}\_p(\alpha+s-1)}{\bar{\Gamma}\_p(\alpha)} \frac{\bar{\Gamma}\_p(\alpha+\beta)}{\bar{\Gamma}\_p(\alpha+\beta+s-1)} \\ &= \frac{\Gamma(\alpha+\beta)\Gamma(\alpha+\beta-1)}{\Gamma(\alpha)\Gamma(\alpha-1)} \frac{\Gamma(\alpha+s-1)\Gamma(\alpha+s-2)}{\Gamma(\alpha+\beta+s-1)\Gamma(\alpha+\beta+s-2)} \\ &= \frac{\Gamma(4)\Gamma(3)}{\Gamma(2)\Gamma(1)} \frac{\Gamma(1+s)\Gamma(s)}{\Gamma(3+s)\Gamma(2+s)} = \frac{12}{(2+s)(1+s)^2s}. \end{split}$$

The inverse Mellin transform then yields the density of *u*˜, denoted by *g(*˜ *u)*˜ , which is

$$\tilde{g}(\tilde{\mu}) = 12 \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} \frac{1}{(2+s)(1+s)^2 s} \mu^{-s} ds \tag{i}$$

where the *c* in the contour is any real number *c >* 0. There is a pole of order 1 at *s* = 0 and another pole of order 1 at *s* = −2*,* the residues at these poles being obtained as follows:

$$\lim\_{s \to 0} \frac{u^{-s}}{(2+s)(1+s)^2} = \frac{1}{2}, \quad \lim\_{s \to -2} \frac{u^{-s}}{(1+s)^2 s} = -\frac{u^2}{2}.$$

The pole at *s* = −1 is of order 2 and hence the residue is given by

$$\lim\_{s \to -1} \left\{ \frac{d}{ds} \frac{u^{-s}}{s(2+s)} \right\} = \lim\_{s \to -1} \left\{ \frac{(-\ln u)u^{-s}}{s(2+s)} - \frac{u^{-s}}{s^2(2+s)} - \frac{u^{-s}}{s(2+s)^2} \right\}$$

$$= u \ln u - u + u = u \ln u.$$

Hence the density is the following:

$$
\tilde{g}(\tilde{\mu}) = 6 - 6\mu^2 + 12\mu \ln \mu, \ 0 \le \mu \le 1,
$$

and zero elsewhere, where *<sup>u</sup>* is real. It can readily be shown that *g(*˜ *u)*˜ <sup>≥</sup> 0 and # <sup>1</sup> <sup>0</sup> *g(*˜ *u)*˜ d*u* = 1. This completes the computations.

#### **Exercises 5.3**

**5.3.1.** Evaluate the real *p* × *p* matrix-variate type-2 beta integral from first principles or by direct evaluation by partitioning the matrix as in Sect. 5.3.3 (general partitioning).

**5.3.2.** Repeat Exercise 5.3.1 for the complex case.

**5.3.3.** In the 2 × 2 partitioning of a *p* × *p* real matrix-variate gamma density with shape parameter *α* and scale parameter *I* , where the first diagonal block *X*<sup>11</sup> is *r* × *r, r < p*, compute the density of the rectangular block *X*12.

**5.3.4.** Repeat Exercise 5.3.3 for the complex case.

**5.3.5.** Let the *p* × *p* real matrices *X*<sup>1</sup> and *X*<sup>2</sup> have real matrix-variate gamma densities with the parameters *(α*1*, B > O)* and *(α*2*, B > O)*, respectively, *B* being the same for both distributions. Compute the density of (1): *<sup>U</sup>*<sup>1</sup> <sup>=</sup> *<sup>X</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>X</sup>*1*X*−<sup>1</sup> 2 <sup>2</sup> , (2): *U*<sup>2</sup> = *X* 1 2 <sup>1</sup> *<sup>X</sup>*−<sup>1</sup> <sup>2</sup> *X* 1 2 1 , (3): *U*<sup>3</sup> = *(X*<sup>1</sup> + *X*2*)* −1 <sup>2</sup>*X*2*(X*<sup>1</sup> + *X*2*)* −1 <sup>2</sup> , when *X*<sup>1</sup> and *X*<sup>2</sup> are independently distributed.

**5.3.6.** Repeat Exercise 5.3.5 for the complex case.

**5.3.7.** In the transformation *Y* = *I* − *X* that was used in Sect. 5.3.1, the Jacobian is d*Y* = *(*−1*) p(p*+1*)* <sup>2</sup> d*X*. What happened to the factor *(*−1*) p(p*+1*)* <sup>2</sup> ?

**5.3.8.** Consider *X* in the (a) 2 × 2, (b) 3 × 3 real matrix-variate case. If *X* is real matrix-variate gamma distributed, then derive the densities of the determinant of *X* in (a) and (b) if the parameters are *<sup>α</sup>* <sup>=</sup> <sup>5</sup> <sup>2</sup> *, B* = *I* . Consider *X*˜ in the (a) 2×2, (b) 3×3 complex matrix-variate case. Derive the distributions of |det*(X)*˜ | in (a) and (b) if *X*˜ is complex matrix-variate gamma distributed with parameters *(α* = 2 + *i, B* = *I )*.

**5.3.9.** Consider the real cases (a) and (b) in Exercise 5.3.8 except that the distribution is type-1 beta with the parameters *(α* <sup>=</sup> <sup>5</sup> <sup>2</sup> *, β* <sup>=</sup> <sup>5</sup> <sup>2</sup> *)*. Derive the density of the determinant of *X*.

**5.3.10.** Consider *X*˜ , (a) 2 × 2, (b) 3 × 3 complex matrix-variate type-1 beta distributed with parameters *<sup>α</sup>* <sup>=</sup> <sup>5</sup> <sup>2</sup> <sup>+</sup> *i, β* <sup>=</sup> <sup>5</sup> <sup>2</sup> − *i)*. Then derive the density of |det*(X)*˜ | in the cases (a) and (b).

**5.3.11.** Consider *X*, (a) 2 × 2, (b) 3 × 3 real matrix-variate type-2 beta distributed with the parameters *(α* <sup>=</sup> <sup>3</sup> <sup>2</sup> *, β* <sup>=</sup> <sup>3</sup> <sup>2</sup> *)*. Derive the density of |*X*| in the cases (a) and (b).

**5.3.12.** Consider *X,* ˜ (a) 2 × 2, (b) 3 × 3 complex matrix-variate type-2 beta distributed with the parameters *(α* <sup>=</sup> <sup>3</sup> <sup>2</sup> *, β* <sup>=</sup> <sup>3</sup> <sup>2</sup> *)*. Derive the density of |det*(X)*˜ | in the cases (a) and (b).

#### **5.4. The Densities of Some General Structures**

Three cases were examined in Section 5.3: the product of real scalar gamma variables, the product of real scalar type-1 beta variables and the product of real scalar type-2 beta variables, where in all these instances, the individual variables were mutually independently distributed. Let us now consider the corresponding general structures. Let *xj* be a real scalar gamma variable with shape parameter *αj* and scale parameter 1 for convenience and let the *xj* 's be independently distributed for *j* = 1*,...,p*. Then, letting *v*<sup>1</sup> = *x*<sup>1</sup> ··· *xp,*

$$E[v\_1^h] = \prod\_{j=1}^p \frac{\Gamma(\alpha\_j + h)}{\Gamma(\alpha\_j)}, \ \Re(\alpha\_j + h) > 0, \ \Re(\alpha\_j) > 0. \tag{5.4.1}$$

Now, let *y*1*,...,yp* be independently distributed real scalar type-1 beta random variables with the parameters *(αj , βj ), (αj ) >* 0*, (βj ) >* 0*, j* = 1*,... , p,* and *v*<sup>2</sup> = *y*<sup>1</sup> ··· *yp*,

$$E[v\_2^h] = \prod\_{j=1}^p \frac{\Gamma(\alpha\_j + h)}{\Gamma(\alpha\_j)} \frac{\Gamma(\alpha\_j + \beta\_j)}{\Gamma(\alpha\_j + \beta\_j + h)}\tag{5.4.2}$$

for *(αj ) >* 0*, (βj ) >* 0*, (αj* + *h) >* 0*, j* = 1*,...,p*. Similarly, let *z*1*,...,zp,* be independently distributed real scalar type-2 beta random variables with the parameters *(αj , βj ), j* = 1*,... , p,* and let *v*<sup>3</sup> = *z*<sup>1</sup> ··· *zp*. Then, we have

$$E[v\_3^h] = \prod\_{j=1}^p \frac{\Gamma(\alpha\_j + h)}{\Gamma(\alpha\_j)} \frac{\Gamma(\beta\_j - h)}{\Gamma(\beta\_j)}\tag{5.4.3}$$

for *(αj ) >* 0*, (βj ) >* 0*, (αj* + *h) >* 0*, (βj* − *h) >* 0*, j* = 1*,...,p*. The corresponding densities of *v*1*, v*2*, v*3*,* respectively denoted by *g*1*(v*1*), g*2*(v*2*), g*3*(v*3*)*, are available from the inverse Mellin transforms by taking (5.4.1) to (5.4.3) as the Mellin transforms of *g*1*, g*2*, g*<sup>3</sup> with *h* = *s* − 1 for a complex variable *s* where *s* is the Mellin parameter. Then, for suitable contours *L,* the densities can be determined as follows:

$$\begin{split} g\_1(v\_1) &= \frac{1}{2\pi i} \int\_L E[\upsilon\_1^{s-1}] \upsilon\_1^{-s} \mathrm{d}s, \; i = \sqrt{(-1)}, \\ &= \left\{ \prod\_{j=1}^p \frac{1}{\Gamma(\alpha\_j)} \right\} \frac{1}{2\pi i} \int\_L \left\{ \prod\_{j=1}^p \Gamma(\alpha\_j + s - 1) \right\} \upsilon\_1^{-s} \mathrm{d}s \\ &= \left\{ \prod\_{j=1}^p \frac{1}{\Gamma(\alpha\_j)} \right\} G\_{0,p}^{p,0}[\upsilon\_1|\_{\alpha\_j - 1, \; j = 1, \ldots, p}], \; 0 \le \upsilon\_1 < \infty, \end{split} \tag{5.4.4}$$

where *(αj* + *s* − 1*) >* 0*, j* = 1*,... , p,* and *g*1*(v*1*)* = 0 elsewhere. This last representation is expressed in terms of a G-function, which will be defined in Sect. 5.4.1.

$$\begin{split} g\_{2}(v\_{2}) &= \frac{1}{2\pi i} \int\_{L} E[v\_{2}^{s-1}] v\_{2}^{-s} \mathrm{d}s, \; i = \sqrt{(-1)}, \\ &= \left\{ \prod\_{j=1}^{p} \frac{\Gamma(\alpha\_{j} + \beta\_{j})}{\Gamma(\alpha\_{j})} \right\} \frac{1}{2\pi i} \int\_{L} \left\{ \prod\_{j=1}^{p} \frac{\Gamma(\alpha\_{j} + s - 1)}{\Gamma(\alpha\_{j} + \beta\_{j} + s - 1)} \right\} v\_{2}^{-s} \mathrm{d}s \\ &= \left\{ \prod\_{j=1}^{p} \frac{\Gamma(\alpha\_{j} + \beta\_{j})}{\Gamma(\alpha\_{j})} \right\} G\_{p,p}^{p,0} \left[ v\_{2} \middle| \begin{matrix} \alpha\_{2} | \alpha\_{j} + \beta\_{j} - 1, \ldots, p \\ \alpha\_{j} - 1, \ldots, p \end{matrix} \right], \; 0 \le v\_{2} \le 1, \end{split} \tag{5.4.5}$$

where *Gp,*<sup>0</sup> *p,p* is a G-function, *(αj* + *s* − 1*) >* 0*, (αj ) >* 0*, (βj ) >* 0*, j* = 1*,... , p,* and *g*2*(v*2*)* = 0 elsewhere.

$$\begin{split} g\_{\mathcal{I}}(v\_{3}) &= \frac{1}{2\pi i} \int\_{L} E[v\_{3}^{s-1}] v\_{3}^{-s} \mathrm{d}s, \; i = \sqrt{(-1)}, \\ &= \left\{ \prod\_{j=1}^{p} \frac{1}{\Gamma(\alpha\_{j})\Gamma(\beta\_{j})} \right\} \int\_{L} \left\{ \prod\_{j=1}^{p} \Gamma(\alpha\_{j} + s - 1) \Gamma(\beta\_{j} - s + 1) \right\} v\_{3}^{-s} \mathrm{d}s \\ &= \left\{ \prod\_{j=1}^{p} \frac{1}{\Gamma(\alpha\_{j})\Gamma(\beta\_{j})} \right\} G\_{p,p}^{p,p} \left[ v\_{3} \right]\_{\alpha\_{j} = 1, \; j = 1, \ldots, p}^{-\beta\_{j}, \; j = 1, \ldots, p} \right\}, \; 0 \le v\_{3} < \infty, \qquad (5.4.6) \end{split}$$

where *(αj ) >* 0*, (βj ) >* 0*, (αj* + *s* − 1*) >* 0*, (βj* − *s* + 1*) >* 0*, j* = 1*,... , p,* and *g*3*(v*3*)* = 0 elsewhere.

#### **5.4.1. The G-function**

The G-function is defined in terms of the following Mellin-Barnes integral:

$$\begin{aligned} G(z) &= G\_{p,q}^{m,n}(z) = G\_{p,q}^{m,n} \left[ z \big|\_{b\_1, \dots, b\_q}^{a\_1, \dots, a\_p} \right] \\ &= \frac{1}{2\pi i} \int\_L \phi(s) z^{-s} ds, \; i = \sqrt{(-1)} \\ \phi(s) &= \frac{\{\prod\_{j=1}^m \Gamma(b\_j + s)\} \{\prod\_{j=1}^n \Gamma(1 - a\_j - s)\}}{\{\prod\_{j=m+1}^q \Gamma(1 - b\_j - s)\} \{\prod\_{j=n+1}^p \Gamma(a\_j + s)\}} \end{aligned}$$

where the parameters *aj , j* = 1*,... , p, bj , j* = 1*,... ,q,* can be complex numbers. There are three general contours *L*, say *L*1*, L*2*, L*<sup>3</sup> where *L*<sup>1</sup> is a loop starting and ending at −∞ that contains all the poles of *Γ (bj* + *s), j* = 1*, . . . , m,* and none of those of *Γ (*1 − *aj* − *s), j* = 1*,...,n*. In general *L* will separate the poles of *Γ (bj* + *s), j* = 1*, . . . , m,* from those of *Γ (*1 − *aj* − *s), j* = 1*, . . . , n,* which lie on either side of the contour. *L*<sup>2</sup> is a loop starting and ending at +∞, which encloses all the poles of *Γ (*1 − *aj* − *s), j* = 1*,...,n*. *L*<sup>3</sup> is the straight line contour *c* − *i*∞ to *c* + *i*∞. The existence of the contours, convergence conditions, explicit series forms for general parameters as well as applications are available in Mathai (1993). G-functions can readily be evaluated with symbolic computing packages such as MAPLE and Mathematica.

**Example 5.4.1.** Let *x*1*, x*2*, x*<sup>3</sup> be independently distributed real scalar random variables, *x*<sup>1</sup> being real gamma distributed with the parameters *(α*<sup>1</sup> = 3*, β*<sup>1</sup> = 2*)*, *x*2, real type-1 beta distributed with the parameters *(α*<sup>2</sup> <sup>=</sup> <sup>3</sup> <sup>2</sup> <sup>+</sup> <sup>2</sup>*i, β*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> *)* and *x*3, real type-2 beta distributed

with the parameters *(α*<sup>3</sup> <sup>=</sup> <sup>5</sup> <sup>2</sup> <sup>+</sup> *i, β*<sup>3</sup> <sup>=</sup> <sup>2</sup> <sup>−</sup> *i)*. Let *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*1*x*2*x*3*, u*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *x*2*x*3 and *<sup>u</sup>*<sup>3</sup> <sup>=</sup> *<sup>x</sup>*<sup>2</sup> *x*1*x*3 with densities *gj (uj ), j* = 1*,* 2*,* 3*,* respectively. Derive the densities *gj (uj ), j* = 1*,* 2*,* 3*,* and represent them in terms of G-functions.

**Solution 5.4.1.** Observe that *E* 1 *xj s*−<sup>1</sup> <sup>=</sup>*E*[*x*−*s*+<sup>1</sup> *<sup>j</sup>* ]*, j* = 1*,* 2*,* 3, and that *g*1*(u*1*), g*2*(u*2*)* and *g*3*(u*3*)* will share the same 'normalizing constant', say *c*, which is the product of the parts of the normalizing constants in the densities of *x*1*, x*<sup>2</sup> and *x*<sup>3</sup> that do not cancel out when determining the moments, respectively denoted by *c*1*, c*<sup>2</sup> and *c*3, that is, *c* = *c*<sup>1</sup> *c*<sup>2</sup> *c*3. Thus,

$$\begin{split} c &= \frac{1}{\Gamma(\alpha\_1)} \frac{\Gamma(\alpha\_2 + \beta\_2)}{\Gamma(\alpha\_2)} \frac{1}{\Gamma(\alpha\_3)\Gamma(\beta\_3)} \\ &= \frac{1}{\Gamma(3)} \frac{\Gamma(2 + 2i)}{\Gamma(\frac{3}{2} + 2i)} \frac{1}{\Gamma(\frac{5}{2} + i)\Gamma(2 - i)}. \end{split} \tag{i}$$

The following are *<sup>E</sup>*[*xs*−<sup>1</sup> *<sup>j</sup>* ] and *<sup>E</sup>*[*x*−*s*+<sup>1</sup> *<sup>j</sup>* ] for *j* = 1*,* 2*,* 3:

$$E[\mathbf{x}\_1^{s-1}] = c\_1 \mathbf{2}^{s-1} \Gamma(2+s), \; E[\mathbf{x}\_1^{-s+1}] = c\_1 \mathbf{2}^{-s+1} \Gamma(4-s) \tag{ii}$$

$$E[\mathbf{x}\_2^{s-1}] = c\_2 \frac{\Gamma(\frac{1}{2} + 2i + s)}{\Gamma(1 + 2i + s)}, \ E[\mathbf{x}\_2^{-s+1}] = c\_2 \frac{\Gamma(\frac{s}{2} + 2i - s)}{\Gamma(3 + 2i - s)}\tag{iii}$$

$$E[\mathbf{x}\_3^{s-1}] = c\_3 \, \Gamma(3/2 + i + s) \Gamma(3 - i - s), \,\, E[\mathbf{x}\_3^{-s+1}] = c\_3 \, \Gamma(7/2 + i - s) \Gamma(1 - i + s). \,\, \tag{i\nu}$$

Then from (*i*)-(*iv*),

$$E[u\_1^{s-1}] = c \cdot 2^{s-1} \Gamma(2+s) \frac{\Gamma(\frac{1}{2}+2i+s)}{\Gamma(1+2i+s)} \Gamma(3/2+i+s) \Gamma(3-i-s).$$

Taking the inverse Mellin transform and writing the density *g*1*(u*1*)* in terms of a G-function, we have

$$g\_1(u\_1) = \frac{c}{2} G\_{2,3}^{3,1} \left[ \frac{u\_1}{2} \Big|\_{2,\ \frac{1}{2} + 2i,\ \frac{3}{2} + i}^{-2 + i,\ 1 + 2i} \right].$$

Using (*i*)-(*iv*) and rearranging the gamma functions so that those involving +*s* appear together in the numerator, we have the following:

$$E[\mu\_2^{s-1}] = \frac{c}{2} \mathcal{Z}^s \Gamma(2+s) \frac{\Gamma(1-i+s)}{\Gamma(3+2i-s)} \Gamma(\mathbb{S}/2+2i-s) \Gamma(7/2+i-s).$$

Taking the inverse Mellin transform and expressing the result in terms of a G-function, we obtain the density *g*2*(u*2*)* as

$$g\_2(\mu\_2) = \frac{c}{2} \, G\_{2,3}^{2,2} \left[ \frac{\mu\_2}{2} \Big|\_{2,-i, \frac{\mathfrak{s}}{2} + 2i, \ -2 - 2i}^{-\frac{\mathfrak{s}}{2} - i} \right].$$

Using (*i*)-(*iv*) and conveniently rearranging the gamma functions involving +*s*, we have

$$E[\mathfrak{u}\_3^{s-1}] = 2 \operatorname{c2}^{-s} \Gamma(1/2 + 2i + s) \Gamma(1 - i + s) \Gamma(4 - s) \frac{\Gamma(\frac{7}{2} + i - s)}{\Gamma(1 + 2i + s)}.$$

On taking the inverse Mellin transform, the following density is obtained:

$$g\_3(\mu\_3) = 2 \, c \, G\_{3,2}^{2,2} \left[ 2 \mu\_3 \Big|\_{\frac{1}{2} + 2i, \ 1 - i}^{-3, \ -\frac{5}{2} - i, \ 1 + 2i} \right].$$

This completes the computations.

#### **5.4.2. Some special cases of the G-function**

Certain special cases of the G-function can be written in terms of elementary functions. Here are some of them:

*G*1*,*<sup>0</sup> <sup>0</sup>*,*1*(z*|*a)* <sup>=</sup> *<sup>z</sup>a*e−*<sup>z</sup> , z* = 0 *G*1*,*<sup>1</sup> 1*,*1 − *z* 1−*a* 0 <sup>=</sup> *Γ (a)(*<sup>1</sup> <sup>−</sup> *z)*<sup>−</sup>*a,* <sup>|</sup>*z*<sup>|</sup> *<sup>&</sup>lt;* <sup>1</sup> *G*1*,*<sup>0</sup> 1*,*1 *z α*+*β*+<sup>1</sup> *α* <sup>=</sup> <sup>1</sup> *Γ (β* + 1*) <sup>z</sup>α(*<sup>1</sup> <sup>−</sup> *z)β,* <sup>|</sup>*z*<sup>|</sup> *<sup>&</sup>lt;* <sup>1</sup> *G*1*,*<sup>1</sup> 1*,*1 *azα β/α β/α* <sup>=</sup> *<sup>a</sup>β/α <sup>z</sup><sup>β</sup>* 1 + *azα ,* <sup>|</sup>*azα*<sup>|</sup> *<sup>&</sup>lt;* <sup>1</sup> *G*1*,*<sup>1</sup> 1*,*1 *azα* 1−*γ* +*β/α β/α* <sup>=</sup> *Γ (γ )aβ/α <sup>z</sup><sup>β</sup> (*1 + *azα)γ ,* <sup>|</sup>*azα*<sup>|</sup> *<sup>&</sup>lt;* <sup>1</sup> *G*1*,*<sup>2</sup> 2*,*2 <sup>−</sup>*z*<sup>2</sup> <sup>1</sup>−*a,* <sup>1</sup> <sup>2</sup>−*a* 0*,* <sup>1</sup> 2 <sup>=</sup> *Γ (*2*a)* <sup>22</sup>*<sup>a</sup>* [*(*<sup>1</sup> <sup>+</sup> *z)*−2*<sup>a</sup>* <sup>+</sup> *(*<sup>1</sup> <sup>−</sup> *z)*−2*a*]*,* <sup>|</sup>*z*<sup>|</sup> *<sup>&</sup>lt;* <sup>1</sup> *G*1*,*<sup>2</sup> 2*,*2 *z* 1 <sup>2</sup>−*a,*1−*a* 0*,*−2*a* = *π* 1 2 *a* [1 + *(*1 + *z)* 1 2 ] <sup>−</sup><sup>2</sup>*a,* <sup>|</sup>*z*<sup>|</sup> *<sup>&</sup>lt;* <sup>1</sup> *G*1*,*<sup>0</sup> 0*,*2 *z*2 4 1 <sup>4</sup> *,*−<sup>1</sup> 4 = 2 *πz* 1 2 sin *z G*1*,*<sup>0</sup> 0*,*2 *z*2 4 −1 4 *,* 1 4 = 2 *πz* 1 2 cos *z G*1*,*<sup>0</sup> 0*,*2 −*z*2 4 <sup>0</sup>*,*−<sup>1</sup> 2 <sup>=</sup> <sup>2</sup> *zπ* 1 2 sinh*z G*1*,*<sup>0</sup> 0*,*2 −*z*2 4 0*,* <sup>1</sup> 2 <sup>=</sup> *<sup>π</sup>*−<sup>1</sup> <sup>2</sup> cosh*z G*1*,*<sup>2</sup> 2*,*2 ± *z* 1*,*1 1*,*0 = ln*(*1 ± *z),* |*z*| *<* 1

$$\begin{aligned} &G\_{p,q+1}^{1,p}\left[z|\_{0,1-b\_1,\dots,1-b\_q}^{1-a\_1,\dots,1-a\_p}\right] \\ &=\left[\frac{\Gamma(a\_1)\cdots\Gamma(a\_p)}{\Gamma(b\_1)\cdots\Gamma(b\_q)}\right] \,\_pF\_q(a\_1,\dots,a\_p;b\_1,\dots,b\_q;-z) \end{aligned}$$

for *p* ≤ *q* or *p* = *q* + 1 and |*z*| *<* 1.

#### **5.4.3. The H-function**

If we have a general structure corresponding to *v*1*, v*<sup>2</sup> and *v*<sup>3</sup> of Sect. 5.4, say *w*1*, w*<sup>2</sup> and *w*<sup>3</sup> of the form

$$w\_1 = \mathbf{x}\_1^{\delta\_1} \mathbf{x}\_2^{\delta\_2} \cdots \mathbf{x}\_p^{\delta\_p} \tag{5.4.7}$$

$$w\_2 = \mathbf{y}\_1^{\delta\_1} \mathbf{y}\_2^{\delta\_2} \cdots \mathbf{y}\_p^{\delta\_p} \tag{5.4.8}$$

$$w\_3 = z\_1^{\delta\_1} z\_2^{\delta\_2} \cdots z\_p^{\delta\_p} \tag{5.4.9}$$

for some *δj >* 0*, j* = 1*,...,p* the densities of *w*1*, w*<sup>2</sup> and *w*<sup>3</sup> are then available in terms of a more general function known as the H-function. It is again a Mellin-Barnes type integral defined and denoted as follows:

$$\begin{split} H(z) &= H\_{p,q}^{m,n}(z) = H\_{p,q}^{m,n} \left[ \mathbb{1}\_{\{(b\_1, \beta\_1), \dots, (b\_q, \beta\_q)\}}^{\lfloor n\rfloor} \right] \\ &= \frac{1}{2\pi i} \int\_L \psi(s) z^{-s} ds, \; i = \sqrt{(-1)}, \\ \psi(s) &= \frac{\{\prod\_{j=1}^m \Gamma(b\_j + \beta\_j s)\} \{\prod\_{j=1}^n \Gamma(1 - a\_j - a\_j s)\}}{\{\prod\_{j=m+1}^q \Gamma(1 - b\_j - \beta\_j s)\} \{\prod\_{j=n+1}^p \Gamma(a\_j + a\_j s)\}} \end{split} \tag{5.4.10}$$

where *αj >* 0*, j* = 1*,... , p, βj >* 0*, j* = 1*,... ,q,* are real and positive, *aj , j* = 1*,... , p,* and *bj , j* = 1*,... ,q,* are complex numbers. Three main contours *L*1*, L*2*, L*<sup>3</sup> are utilized, similarly to those described in connection with the G-function. Existence conditions, properties and applications of this generalized hypergeometric function are available from Mathai et al. (2010) among other monographs. Numerous special cases can be expressed in terms of known elementary functions.

**Example 5.4.2.** Let *x*<sup>1</sup> and *x*<sup>2</sup> be independently distributed real type-1 beta random variables with the parameters*(αj >* 0*, βj >* 0*), j* = 1*,* 2*,* respectively. Let *y*<sup>1</sup> = *x δ*1 <sup>1</sup> *, δ*<sup>1</sup> *>* 0*,* and *y*<sup>2</sup> = *x δ*2 <sup>2</sup> *, δ*<sup>2</sup> *>* 0. Compute the density of *u* = *y*1*y*2.

**Solution 5.4.2.** Arbitrary moments of *y*<sup>1</sup> and *y*<sup>2</sup> are available from those of *x*<sup>1</sup> and *x*2.

$$\begin{split} E[\mathbf{x}\_{j}^{h}] &= \frac{\Gamma(\alpha\_{j} + h)}{\Gamma(\alpha\_{j})} \frac{\Gamma(\alpha\_{j} + \beta\_{j})}{\Gamma(\alpha\_{j} + \beta\_{j} + h)}, \ \Re(\alpha\_{j} + h) > 0, \ j = 1, 2, \\ E[\mathbf{y}\_{j}^{h}] &= E[\mathbf{x}\_{j}^{\delta\_{j}h}] = \frac{\Gamma(\alpha\_{j} + \delta\_{j}h)}{\Gamma(\alpha\_{j})} \frac{\Gamma(\alpha\_{j} + \beta\_{j})}{\Gamma(\alpha\_{j} + \beta\_{j} + \delta\_{j}h)}, \ \Re(\alpha\_{j} + \delta\_{j}h) > 0, \\ E[\mathbf{u}^{s-1}] &= E[\mathbf{y}\_{1}^{s-1}] E[\mathbf{y}\_{2}^{s-1}] \\ &= \prod\_{j=1}^{2} \frac{\Gamma(\alpha\_{j} + \delta\_{j}(s-1))}{\Gamma(\alpha\_{j})} \frac{\Gamma(\alpha\_{j} + \beta\_{j})}{\Gamma(\alpha\_{j} + \beta\_{j} + \delta\_{j}(s-1))}. \end{split} \tag{5.4.11}$$

Accordingly, the density of *u*, denoted by *g(u)*, is the following:

$$\begin{split} g(u) &= C \frac{1}{2\pi i} \int\_{L} \left\{ \prod\_{j=1}^{2} \frac{\Gamma(\alpha\_{j} - \delta\_{j} + \delta\_{j}s)}{\Gamma(\alpha\_{j} + \beta\_{j} - \delta\_{j} + \delta\_{j}s)} \right\} u^{-s} \, \mathrm{d}s \\ &= C H\_{2,2}^{2,0} \left[ u \Big|\_{ (\alpha\_{1} - \delta\_{1}, \delta\_{1}), \, (\alpha\_{2} - \delta\_{2}, \delta\_{2})}^{ (\alpha\_{1} + \beta\_{1} - \delta\_{2} - \delta\_{2}, \delta\_{2})} \right], \\ C &= \prod\_{j=1}^{2} \frac{\Gamma(\alpha\_{j} + \beta\_{j})}{\Gamma(\alpha\_{j})} \end{split} \tag{5.4.12}$$

where 0 ≤ *u* ≤ 1, *(αj* −*δj* +*δj s) >* 0*, (αj ) >* 0*, (βj ) >* 0*, j* = 1*,* 2 and *g(u)* = 0 elsewhere.

When *α*<sup>1</sup> = 1 = ··· = *αp, β*<sup>1</sup> = 1 = ··· = *βq* , the H-function reduces to a G-function. This G-function is frequently referred to as Meijer's G-function and the Hfunction, as Fox's H-function.

#### **5.4.4. Some special cases of the H-function**

Certain special cases of the H-function are listed next.

$$\begin{aligned} H\_{0,1}^{1,0}[\mathbf{x}|\_{(b,\beta)}] &= \theta^{-1} \mathbf{x}^{\frac{b}{\beta}} \mathbf{e}^{-\frac{1}{\beta}};\\ H\_{1,1}^{1,1}[z|\_{(0,1)}^{(1-v,1)}] &= \varGamma(\nu)(1+z)^{-\nu} = \varGamma(\nu)\_1 F\_0(\nu; \ ; -z), \ |z| < 1;\\ H\_{0,2}^{1,0}[\frac{z^2}{4}|\_{(\frac{a+v}{2}, \frac{a-v}{2}, 1)}] &= (\frac{\nu}{2})^a J\_\nu(z) \end{aligned}$$

where the Bessel function

$$\begin{split} J\_{\boldsymbol{\nu}}(\boldsymbol{z}) &= \sum\_{k=0}^{\infty} \frac{(-1)^{k} (\boldsymbol{z}/2)^{\boldsymbol{\nu}+2k}}{k! \Gamma(\boldsymbol{\nu}+k+1)} = \frac{(\boldsymbol{z}/2)^{\boldsymbol{\nu}}}{\Gamma(\boldsymbol{\nu}+1)} \boldsymbol{0} F\_{1}(\ \ \boldsymbol{z}, 1+\boldsymbol{\nu}; -\frac{\boldsymbol{z}^{2}}{4});\\ H\_{1,2}^{1,1} \left[ \boldsymbol{z} \big|\_{(0,1),(1-c,1)}^{(1-a,1)} \right] &= \frac{\Gamma(a)}{\Gamma(c)} \boldsymbol{F}\_{1}(a;c;-z);\\ H\_{2,2}^{1,2} \left[ \boldsymbol{x} \big|\_{(0,1),(1-c,1)}^{(1-a,1),(1-b,1)} \right] &= \frac{\Gamma(a) \Gamma(b)}{\Gamma(c)} \boldsymbol{F}\_{2}(a,b;c;-z);\\ H\_{1,2}^{1,1} \left[ -\boldsymbol{z} \big|\_{(0,1),(1-\beta,a)}^{(1-\gamma,\gamma)} \right] &= \boldsymbol{\Gamma}(\gamma) \boldsymbol{E}\_{\boldsymbol{\alpha},\beta}^{\gamma}(\boldsymbol{z}), \ \mathfrak{R}(\boldsymbol{\gamma}) > 0,\end{split}$$

where the generalized Mittag-Leffler function

$$E\_{\alpha,\beta}^{\mathcal{V}}(z) = \sum\_{k=0}^{\infty} \frac{(\nu)\_k}{k!} \frac{z^k}{\Gamma(\beta + \alpha k)}, \ \Re(\alpha) > 0, \ \Re(\beta) > 0, 1$$

where *Γ (γ )* is defined. For *<sup>γ</sup>* <sup>=</sup> <sup>1</sup>*,* we have *<sup>E</sup>*<sup>1</sup> *α,β(z)* = *Eα,β(z)*; when *γ* = 1*, β* = 1*, E*1 *α,*1*(z)* <sup>=</sup> *Eα(z)* and when *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> <sup>=</sup> *<sup>β</sup>* <sup>=</sup> *α,* we have *<sup>E</sup>*1*(z)* <sup>=</sup> <sup>e</sup>*z*.

$$H\_{0,2}^{2,0} \left[ \left. z \right|\_{\left(0,1\right),\left(\frac{\upsilon}{\rho},\frac{1}{\rho}\right)} \right] = \rho \left. K\_{\rho}^{\upsilon}(z) \right| $$

where *K<sup>ν</sup> <sup>ρ</sup> (z)* is Kratzel function ¨

$$K\_{\rho}^{\upsilon}(z) = \int\_0^{\infty} t^{\upsilon - 1} \mathbf{e}^{-t^{\rho} - \frac{z}{t}} \mathbf{d}t,\ \Re(z) > 0.$$

$$H\_{1,1}^{1,0}\left[x\Big|\_{\left(\alpha,1\right)}^{\left(\alpha+\frac{1}{2},1\right)}\right] = \pi^{-\frac{1}{2}}z^{\alpha}(1-z)^{-\frac{1}{2}},\ |z| < 1;$$

#### Matrix-Variate Gamma and Beta Distributions

$$\begin{aligned} &H\_{2,2}^{2,0}\left[z\Big|\_{ (\alpha,1),(\alpha,1) }^{ (\alpha+\frac{1}{3},1),(\alpha+\frac{2}{3},1) }\right] \\ &=z^{\alpha} \,\_2F\_1\left(\frac{2}{3}, \frac{1}{3}; 1; 1-z\right), \ |1-z| < 1. \end{aligned}$$

#### **Exercises 5.4**

**5.4.1.** Show that

$$z^{\mathcal{Y}}G\_{0,1}^{1,0}[pz^{\alpha}|\_{\beta/\alpha}] = p^{\beta/\alpha}z^{\beta+\mathcal{Y}}e^{-pz^{\alpha}}.$$

**5.4.2.** Show that

$$\mathbf{e}^{-z} = G\_{1,2}^{1,1} \left[ z \big|\_{0,1/3}^{1/3} \right] = G\_{2,3}^{2,1} \left[ z \big|\_{0,\frac{1}{2},-\frac{1}{2}}^{-\frac{1}{2},\frac{1}{2}} \right].$$

**5.4.3.** Show that

$$z^{\frac{1}{3}}(1-z)^{-\frac{5}{6}} = \varGamma(1/6)\,G\_{1,1}^{1,0}\left[z\Big|\_{\frac{1}{3}}^{\frac{1}{2}}\right].$$

**5.4.4.** Show that

$$\begin{aligned} \int\_0^\infty x^{a-1} (1-x)^{b-c} (1+x-zx)^{-b} dx \\ &= \frac{\Gamma(a)\Gamma(c-a)}{\Gamma(c)} \,\_2F\_1(a,b;c;z), \, |z| < 1, \,\,\mathfrak{R}(c-a) > 0. \end{aligned}$$

**5.4.5.** Show that

$$\begin{aligned} (a\_1 - a\_2) H\_{p,q}^{m,n} \left[ \mathbb{Z} \Big| \begin{matrix} (a\_1, \alpha\_1), (a\_2, \alpha\_1), (a\_3, \alpha\_3), \dots, (a\_p, \alpha\_p) \\ (b\_1, \beta\_1), \dots, (b\_q, \beta\_q) \end{matrix} \right] \\ = H\_{p,q}^{m,n} \left[ \mathbb{Z} \Big| \begin{matrix} (a\_1, \alpha\_1), (a\_2 - 1, \alpha\_1), (a\_3, \alpha\_3), \dots, (a\_p, \alpha\_p) \\ (b\_1, \beta\_1), \dots, (b\_q, \beta\_q) \end{matrix} \right] \\ - H\_{p,q}^{m,n} \left[ \mathbb{Z} \Big| \begin{matrix} (a\_1 - 1, \alpha\_1), (a\_2, \alpha\_1), (a\_3, \alpha\_3), \dots, (a\_p, \alpha\_p) \\ (b\_1, \beta\_1), \dots, (b\_q, \beta\_q) \end{matrix} \right], \ n \ge 2. \end{aligned}$$

#### **5.5, 5.5a. The Wishart Density**

A particular case of the real *p* × *p* matrix-variate gamma distribution, known as the Wishart distribution, is the preeminent distribution in multivariate statistical analysis. In the general *p* × *p* real matrix-variate gamma density with parameters *(α, B > O),* let *<sup>α</sup>* <sup>=</sup> *<sup>m</sup>* <sup>2</sup> *, B* <sup>=</sup> <sup>1</sup> <sup>2</sup>*Σ*−<sup>1</sup> and *Σ>O*; the resulting density is called a Wishart density with

degrees of freedom *m* and parameter matrix *Σ>O*. This density, denoted by *fw(W ),* is given by

$$f\_w(W) = \frac{|W|^{\frac{m}{2} - \frac{p+1}{2}}}{2^{\frac{mp}{2}} |\Sigma|^{\frac{m}{2}} \Gamma\_p(\frac{m}{2})} \mathbf{e}^{-\frac{1}{2} \text{tr}(\Sigma^{-1}W)}, \ W > O, \ \Sigma > O,\tag{5.5.1}$$

for *m* ≥ *p*, and *fw(W )* = 0 elsewhere. This will be denoted as *W* ∼ *Wp(m, Σ)*. Clearly, all the properties discussed in connection with the real matrix-variate gamma density still hold in this case. Algebraic evaluations of the marginal densities and explicit evaluations of the densities of sub-matrices will be considered, some aspects having already been discussed in Sects. 5.2 and 5.2.1.

In the complex case, the density is the following, denoted by *f*˜ *w(W )* ˜ :

$$\tilde{f}\_w(\tilde{W}) = \frac{|\det(\tilde{W})|^{m-p} e^{-\text{tr}(\Sigma^{-1}\tilde{W})}}{|\det(\Sigma)|^m \tilde{\Gamma}\_p(m)}, \ \tilde{W} > O, \ \Sigma > O, \ m \ge p,\tag{5.5a.1}$$

and *f*˜ *w(W )* ˜ = 0 elsewhere. This will be denoted as *W*˜ ∼ *W*˜ *p(m, Σ)*.

#### **5.5.1. Explicit evaluations of the matrix-variate gamma integral, real case**

Is it possible to evaluate the matrix-variate gamma integral explicitly by using conventional integration? We will now investigate some aspects of this question.

When the Wishart density is derived from samples coming from a Gaussian population, the basic technique relies on the triangularization process. When *Σ* = *I* , that is, *W* ∼ *Wp(m, I )*, can the integral of the right-hand side of (5.5.1) be evaluated by resorting to conventional methods or by direct evaluation? We will address this problem by making use of the technique of partitioning matrices. Let us partition

$$X = \begin{bmatrix} X\_{11} & X\_{12} \\ X\_{21} & X\_{22} \end{bmatrix}$$

where let *X*<sup>22</sup> = *xpp* so that *X*<sup>21</sup> = *(xp*1*,...,xp p*−1*), X*<sup>12</sup> = *X*- <sup>21</sup>. Then, on applying a result from Sect. 1.3, we have

$$|X|^{\alpha - \frac{p+1}{2}} = |X\_{11}|^{\alpha - \frac{p+1}{2}} [\mathbf{x}\_{pp} - X\_{21} X\_{11}^{-1} X\_{12}]^{\alpha - \frac{p+1}{2}}.\tag{5.5.2}$$

Note that when *X* is positive definite, *X*<sup>11</sup> *> O* and *xpp >* 0*,* and the quadratic form *X*21*X*−<sup>1</sup> <sup>11</sup> *X*<sup>12</sup> *>* 0. As well,

$$\|\mathbf{x}\_{pp} - \mathbf{X}\_{21}\mathbf{X}\_{11}^{-1}\mathbf{X}\_{12}\|^{\mu - \frac{p+1}{2}} = \mathbf{x}\_{pp}^{\mu - \frac{p+1}{2}} \|\mathbf{1} - \mathbf{x}\_{pp}^{-\frac{1}{2}}\mathbf{X}\_{21}\mathbf{X}\_{11}^{-\frac{1}{2}}\mathbf{X}\_{11}^{-\frac{1}{2}}\mathbf{X}\_{12}\mathbf{x}\_{pp}^{-\frac{1}{2}}\|^{\mu - \frac{p+1}{2}}.\tag{5.5.3}$$

Letting *Y* = *x* −1 2 *pp <sup>X</sup>*21*X*−<sup>1</sup> 2 <sup>11</sup> *,* then referring to Mathai (1997, Theorem 1.18) or Theorem 1.6.4 of Chap. 1, d*Y* = *x* <sup>−</sup>*p*−<sup>1</sup> 2 *pp* |*X*11| −1 <sup>2</sup> d*X*<sup>21</sup> for fixed *X*<sup>11</sup> and *xpp*, . The integral over *xpp* gives

$$\int\_0^\infty \mathbf{x}\_{pp}^{\alpha + \frac{p-1}{2} - \frac{p+1}{2}} \mathbf{e}^{-\chi\_{pp}} \mathbf{dx}\_{pp} = \Gamma(\alpha), \ \Re(\alpha) > 0.$$

If we let *u* = *Y Y* - , then from Theorem 2.16 and Remark 2.13 of Mathai (1997) or using Theorem 4.2.3, after integrating out over the Stiefel manifold, we have

$$\mathrm{d}Y = \frac{\pi^{\frac{p-1}{2}}}{\Gamma(\frac{p-1}{2})} \mu^{\frac{p-1}{2}-1} \mathrm{d}\mu\dots$$

(Note that *n* in Theorem 2.16 of Mathai (1997) corresponds to *p* − 1 and *p* is 1). Then, the integral over *u* gives

$$\int\_0^1 u^{\frac{p-1}{2}-1} (1-u)^{\alpha - \frac{p+1}{2}} \, \mathrm{d}u = \frac{\Gamma(\frac{p-1}{2})\Gamma(\alpha - \frac{p-1}{2})}{\Gamma(\alpha)}, \; \Re(\alpha) > \frac{p-1}{2}.$$

Now, collecting all the factors, we have

$$\begin{aligned} |X\_{11}|^{\alpha + \frac{1}{2} - \frac{p+1}{2}} \Gamma(\alpha) \frac{\pi^{\frac{p-1}{2}}}{\Gamma(\frac{p-1}{2})} \frac{\Gamma(\frac{p-1}{2})\Gamma(\alpha - \frac{p-1}{2})}{\Gamma(\alpha)} \\ &= |X\_{11}^{(1)}|^{\alpha + \frac{1}{2} - \frac{p+1}{2}} \pi^{\frac{p-1}{2}} \Gamma(\alpha - (p-1)/2), \end{aligned}$$

for *(α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . Note that <sup>|</sup>*X(*1*)* <sup>11</sup> | is *(p* − 1*)* × *(p* − 1*)* and |*X*11|, after the completion of the first part of the operations, is denoted by <sup>|</sup>*X(*1*)* <sup>11</sup> |, the exponent being changed to *<sup>α</sup>* <sup>+</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> . Now repeat the process by separating *xp*−1*,p*−1, that is, by writing

$$X\_{11}^{(1)} = \begin{bmatrix} X\_{11}^{(2)} & X\_{12}^{(2)} \\ X\_{21}^{(2)} & x\_{p-1,p-1} \end{bmatrix}.$$

Here, *X(*2*)* <sup>11</sup> is of order *(p* <sup>−</sup> <sup>2</sup>*)* <sup>×</sup> *(p* <sup>−</sup> <sup>2</sup>*)* and *<sup>X</sup>(*2*)* <sup>21</sup> is of order 1 × *(p* − 2*)*. As before, letting *u* = *Y Y* with *Y* = *x* −1 2 *<sup>p</sup>*−1*,p*−1*X(*2*)* <sup>21</sup> [*X(*2*)* 11 ] −1 <sup>2</sup> *,* d*Y* = *x* <sup>−</sup>*p*−<sup>2</sup> 2 *<sup>p</sup>*−1*,p*−1|*X(*2*)* 11 | −1 <sup>2</sup> d*X(*2*)* <sup>21</sup> *.* The integral over the Stiefel manifold gives *<sup>π</sup> p*−2 2 *Γ ( <sup>p</sup>*−<sup>2</sup> <sup>2</sup> *) u p*−2 <sup>2</sup> <sup>−</sup>1d*<sup>u</sup>* and the factor containing *(*<sup>1</sup> <sup>−</sup> *u)* is *(*<sup>1</sup> <sup>−</sup> *u)α*+<sup>1</sup> 2−*p*+<sup>1</sup> <sup>2</sup> , the integral over *u* yielding

$$\int\_0^1 u^{\frac{p-2}{2}-1} (1-u)^{\alpha + \frac{1}{2} - \frac{p+1}{2}} \mathrm{d}u = \frac{\Gamma(\frac{p-2}{2})\Gamma(\alpha - \frac{p-2}{2})}{\Gamma(\alpha)}$$

and that over *v* = *xp*−1*,p*−<sup>1</sup> giving

$$\int\_0^1 v^{\alpha + \frac{1}{2} + \frac{p-2}{2} - \frac{p+1}{2}} \mathbf{e}^{-v} \mathrm{d}v = \Gamma(\alpha), \; \Re(\alpha) > 0.$$

The product of these factors is then

$$|X\_{11}^{(2)}|^{
\alpha+1-\frac{p+1}{2}}\pi^{\frac{p-2}{2}}\Gamma(\alpha-(p-2)/2),\ \Re(\alpha)>\frac{p-2}{2}.$$

Successive evaluations carried out by employing the same procedure yield the exponent of *π* as *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> <sup>+</sup> *<sup>p</sup>*−<sup>2</sup> <sup>2</sup> +··· + <sup>1</sup> <sup>2</sup> <sup>=</sup> *p(p*−1*)* <sup>4</sup> and the gamma product, *Γ (α* <sup>−</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> *)Γ (α* <sup>−</sup> *<sup>p</sup>*−<sup>2</sup> <sup>2</sup> *)*··· *Γ (α),* the final result being *Γp(α)*. The result is thus verified.

#### **5.5a.1. Evaluation of matrix-variate gamma integrals in the complex case**

The matrices and gamma functions belonging to the complex domain will be denoted with a tilde. As well, in the complex case, all matrices appearing in the integrals will be *p*×*p* Hermitian positive definite unless otherwise stated; as an example, for such a matrix *X*, this will be denoted by *X>O* ˜ . The integral of interest is

$$\tilde{I}\_p(\alpha) = \int\_{\tilde{X} > 0} |\det(\tilde{X})|^{a-p} \mathbf{e}^{-\text{tr}(\tilde{X})} \mathbf{d}\tilde{X}.\tag{5.5a.2}$$

A standard procedure for evaluating the integral in (5.5*a*.2) consists of expressing the positive definite Hermitian matrix as *X*˜ = *T*˜ *T*˜ <sup>∗</sup> where *T*˜ is a lower triangular matrix with real and positive diagonal elements *tjj >* 0*, j* = 1*,...,p*, where an asterisk indicates the conjugate transpose. Then, referring to (Mathai (1997, Theorem 3.7) or Theorem 1.6.7 of Chap. 1, the Jacobian is seen to be as follows:

$$\mathrm{d}\tilde{X} = \mathcal{Z}^p \left\{ \prod\_{j=1}^p t\_{jj}^{2(p-j)+1} \right\} \mathrm{d}\tilde{T} \tag{5.5a.3}$$

and then

$$\begin{aligned} \text{tr}(\tilde{X}) &= \text{tr}(\tilde{T}\tilde{T}^\*) \\ &= t\_{11}^2 + \dots + t\_{pp}^2 + |\tilde{t}\_{21}|^2 + \dots + |\tilde{t}\_{p1}|^2 + \dots + |\tilde{t}\_{pp-1}|^2 \end{aligned}$$

and

$$|\det(\tilde{X})|^{\alpha-p} \mathrm{d}\tilde{X} = 2^p \left\{ \prod\_{j=1}^p t\_{jj}^{2\alpha-2j+1} \right\} \mathrm{d}\tilde{T} \dots$$

Now. integrating out over *t* ˜ *jk* for *j > k,*

$$\int\_{\tilde{t}\_{jk}} \mathbf{e}^{-|\tilde{t}\_{jk}|^2} \mathbf{d}\tilde{t}\_{jk} = \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} \mathbf{e}^{-(t\_{jk1}^2 + t\_{jk2}^2)} \mathbf{d}t\_{jk1} \wedge \mathbf{d}t\_{jk2} = \pi$$

and

$$\prod\_{j>k} \pi = \pi^{\frac{p(p-1)}{2}}.$$

As well,

$$2\int\_0^\infty t\_{jj}^{2\alpha-2j+1} \mathbf{e}^{-t\_{jj}^2} \, \mathrm{d}\, t\_{jj} = \Gamma(\alpha - j + 1), \,\,\mathfrak{R}(\alpha) > j - 1,$$

for *j* = 1*,... , p.* Taking the product of all these factors then gives

$$
\pi^{\frac{p(p-1)}{2}} \Gamma(\alpha) \Gamma(\alpha - 1) \cdots \Gamma(\alpha - p + 1) = \tilde{\Gamma}\_p(\alpha), \; \Re(\alpha) > p - 1,
$$

and hence the result is verified.

#### **An alternative method based on partitioned matrix, complex case**

The approach discussed in this section relies on the successive extraction of the diagonal elements of *X*˜ , a *p* × *p* positive definite Hermitian matrix, all of these elements being necessarily real and positive, that is, *xjj >* 0*, j* = 1*,...,p*. Let

$$
\tilde{X} = \begin{bmatrix}
\tilde{X\_{11}} & \tilde{X\_{12}} \\
\tilde{X\_{21}} & x\_{pp}
\end{bmatrix}
$$

where *X*˜ <sup>11</sup> is *(p* − 1*)* × *(p* − 1*)* and

$$|\det(\tilde{X})|^{\alpha-p} = |\det(\tilde{X}\_{11})|^{\alpha-p}|\mathbf{x}\_{pp} - \tilde{X}\_{21}\tilde{X}\_{11}^{-1}\tilde{X}\_{12}|^{\alpha-p}$$

and

$$\text{tr}(\tilde{X}) = \text{tr}(\tilde{X}\_{11}) + \mathfrak{x}\_{pp}.$$

Then,

$$|\boldsymbol{x}\_{pp} - \tilde{X}\_{21}\tilde{X}\_{11}^{-1}\tilde{X}\_{12}|^{\alpha - p} = \boldsymbol{x}\_{pp}^{\alpha - p}|1 - \boldsymbol{x}\_{pp}^{-\frac{1}{2}}\tilde{X}\_{21}\tilde{X}\_{11}^{-\frac{1}{2}}\tilde{X}\_{11}^{-\frac{1}{2}}\tilde{X}\_{12}\boldsymbol{x}\_{pp}^{-\frac{1}{2}}|^{\alpha - p}.$$

Let

$$
\tilde{Y} = \mathbf{x}\_{pp}^{-\frac{1}{2}} \tilde{X}\_{21} \tilde{X}\_{11}^{-\frac{1}{2}} \Rightarrow \mathbf{d} \\
\tilde{Y} = \mathbf{x}\_{pp}^{-(p-1)} |\det(\tilde{X}\_{11})|^{-1} \,\mathbf{d} \tilde{X}\_{21},
$$

referring to Theorem 1.6a.4 or Mathai (1997, Theorem 3.2(c)) for fixed *xpp* and *X*11. Now, the integral over *xpp* gives

$$\int\_0^\infty \mathbf{x}\_{pp}^{\alpha - p + (p - 1)} \mathbf{e}^{-\mathbf{x}\_{pp}} \mathbf{d} \mathbf{x}\_{pp} = \Gamma(\alpha), \ \Re(\alpha) > 0.$$

Letting *<sup>u</sup>* <sup>=</sup> *<sup>Y</sup>*˜*Y*˜ <sup>∗</sup>, d*Y*˜ <sup>=</sup> *<sup>u</sup>p*−<sup>2</sup> *<sup>π</sup>p*−<sup>1</sup> *Γ (p*−1*)*d*<sup>u</sup>* by applying Theorem 4.2a.3 or Corollaries 4.5.2 and 4.5.3 of Mathai (1997), and noting that *u* is real and positive, the integral over *u* gives

$$\int\_0^\infty u^{(p-1)-1} (1-u)^{\alpha - (p-1)-1} \mathrm{d}u = \frac{\Gamma(p-1)\Gamma(\alpha - (p-1))}{\Gamma(\alpha)}, \; \Re(\alpha) > p - 1. \; . $$

Taking the product, we obtain

$$\begin{aligned} &|\det(\tilde{X}\_{11}^{(1)})|^{\alpha+1-p} \, \varGamma(\alpha) \frac{\pi^{p-1}}{\varGamma(p-1)} \frac{\Gamma(p-1)\varGamma(\alpha-(p-1))}{\varGamma(\alpha)} \\ &= \pi^{p-1} \, \varGamma(\alpha-(p-1)) |\det(\tilde{X}\_{11}^{(1)})|^{\alpha+1-p} \end{aligned}$$

where *X*˜ *(*1*)* <sup>11</sup> stands for *X*˜ <sup>11</sup> after having completed the first set of integrations. In the second stage, we extract *xp*−1*,p*−1, the first *(p* <sup>−</sup> <sup>2</sup>*)* <sup>×</sup> *(p* <sup>−</sup> <sup>2</sup>*)* submatrix being denoted by *<sup>X</sup>*˜ *(*2*)* 11 and we continue as previously explained to obtain <sup>|</sup>det*(X*˜ *(*2*)* <sup>11</sup> *)*| *<sup>α</sup>*+2−*pπp*−2*Γ (α* <sup>−</sup> *(p* <sup>−</sup> <sup>2</sup>*))*. Proceeding successively in this manner, we have the exponent of *π* as *(p* −1*)*+*(p* −2*)*+ ···+ 1 = *p(p* − 1*)/*2 and the gamma product as *Γ (α* − *(p* − 1*))Γ (α* − *(p* − 2*))*··· *Γ (α)* for *(α) > p* − 1. That is,

$$
\pi^{\frac{p(p-1)}{2}}\Gamma(\alpha)\Gamma(\alpha-1)\cdots\Gamma(\alpha-(p-1)) = \tilde{\Gamma}\_p(\alpha).
$$

#### **5.5.2. Triangularization of the Wishart matrix in the real case**

Let *W* ∼ *Wp(m, Σ), Σ > O* be a *p* × *p* matrix having a Wishart distribution with *m* degrees of freedom and parameter matrix *Σ>O*, that is, let *W* have a density of the following form for *Σ* = *I* :

$$f\_w(W) = \frac{|W|^{\frac{m}{2} - \frac{p+1}{2}} e^{-\frac{1}{2} \text{tr}(W)}}{2^{\frac{mp}{2}} \Gamma\_p(\frac{m}{2})}, \ W > O, \ m \ge p,\tag{5.5.4}$$

and *fw(W )* = 0 elsewhere. Let us consider the transformation *W* = *T T* where *T* is a lower triangular matrix with positive diagonal elements. Since *W > O,* the transformation *W* = *T T* with the diagonal elements of *T* being positive is one-to-one. We have already evaluated the associated Jacobian in Theorem 1.6.7, namely,

$$\mathrm{d}W = 2^p \left\{ \prod\_{j=1}^p t\_{jj}^{p+1-j} \right\} \mathrm{d}T. \tag{5.5.5}$$

Under this transformation,

$$f(W)\mathrm{d}W = \frac{1}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})} \left\{ \prod\_{j=1}^p (t\_{jj}^2)^{\frac{m}{2} - \frac{p+1}{2}} \right\} \mathrm{e}^{-\frac{1}{2}\sum\_{i>j}t\_{ij}^2} 2^p \left\{ \prod\_{j=1}^p t\_{jj}^{p+1-j} \right\} \mathrm{d}\,T$$

$$= \frac{1}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})} 2^p \left\{ \prod\_{j=1}^p (t\_{jj}^2)^{\frac{m}{2} - \frac{j}{2}} \right\} \mathrm{e}^{-\frac{1}{2}\sum\_{j=1}^p t\_{jj}^2 - \frac{1}{2}\sum\_{i>j}t\_{ij}^2} \mathrm{d}\,T. \tag{5.5.6}$$

In view of (5.5.6), it is evident that *tjj , j* = 1*,...,p* and the *tij* 's, *i>j* are mutually independently distributed. The form of the function containing *tij , i > j,* is e−<sup>1</sup> 2 *t*2 *ij ,* and hence the *tij* 's for *i>j* are mutually independently distributed real standard normal variables. It is also seen from (5.5.6) that the density of *t*<sup>2</sup> *jj* is of the form

$$\mathbf{e}\_j \mathbf{y}\_j^{\frac{m}{2} - \frac{j-1}{2} - 1} \mathbf{e}^{-\frac{1}{2} y\_j}, \ \mathbf{y}\_j = t\_{jj}^2, 2$$

which is the density of a real chisquare variable having *m* − *(j* − 1*)* degrees of freedom for *j* = 1*,... , p,* where *cj* is the normalizing constant. Hence, the following result:

**Theorem 5.5.1.** *Let the real p*×*p positive definite matrix W have a real Wishart density as specified in (5.5.4) and let W* = *T T where T* = *(tij ) is a lower triangular matrix whose diagonal elements are positive. Then, the non-diagonal elements tij such that i>j are mutually independently distributed as real standard normal variables, the diagonal elements t* 2 *jj , j* = 1*,... , p, are independently distributed as a real chisquare variables having <sup>m</sup>* <sup>−</sup> *(j* <sup>−</sup> <sup>1</sup>*) degrees of freedom for <sup>j</sup>* <sup>=</sup> <sup>1</sup>*,... , p, and the <sup>t</sup>*<sup>2</sup> *jj 's and tij 's are mutually independently distributed.*

**Corollary 5.5.1.** *Let <sup>W</sup>* <sup>∼</sup> *Wp(n, σ*<sup>2</sup>*I ), where <sup>σ</sup>*<sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup> *is a real scalar quantity. Let <sup>W</sup>* <sup>=</sup> *T T where T* = *(tij ) is a lower triangular matrix whose diagonal elements are positive. Then, the tjj 's are independently distributed for j* = 1*,...,p, the tij 's, i > j, are independently distributed, and all tjj 's and tij 's are mutually independently distributed, where t* 2 *jj /σ*<sup>2</sup> *has a real chisquare distribution with <sup>m</sup>* <sup>−</sup> *(j* <sup>−</sup> <sup>1</sup>*) degrees of freedom for j* = 1*,... , p, and tij , i > j, has a real scalar Gaussian distribution with mean value zero and variance σ*2*, that is, tij iid* <sup>∼</sup> *N (*0*, σ*2*) for all i>j .*

#### **5.5a.2. Triangularization of the Wishart matrix in the complex domain**

Let *W*˜ have the following Wishart density in the complex domain:

$$\tilde{f}\_w(\tilde{W}) = \frac{1}{\tilde{\Gamma}\_p(m)} |\text{det}(\tilde{W})|^{m-p} \text{e}^{-\text{tr}(\tilde{W})}, \ \tilde{W} > O, \ m \ge p,\tag{5.5a.4}$$

and *f*˜ *w(W )* ˜ = 0 elsewhere, which is denoted *W*˜ ∼ *W*˜ *p(m, I )*. Consider the transformation *W*˜ = *T*˜ *T*˜ <sup>∗</sup> where *T*˜ is lower triangular whose diagonal elements are real and positive. The transformation *W*˜ = *T*˜ *T*˜ <sup>∗</sup> is then one-to-one and its associated Jacobian, as given in Theorem 1.6a.7, is the following:

$$\mathrm{d}\tilde{W} = 2^p \left\{ \prod\_{j=1}^p t\_{jj}^{2(p-j)+1} \right\} \mathrm{d}\tilde{T}.\tag{5.5a.5}$$

Then we have

$$\begin{split} \tilde{f}(\tilde{W}) \mathrm{d}\tilde{W} &= \frac{1}{\tilde{\Gamma}\_{p}(m)} \Big\{ \prod\_{j=1}^{p} (t\_{jj}^{2})^{m-p} \Big\} \mathrm{e}^{-\sum\_{i \ge j} |\tilde{t}\_{ij}|^{2}} 2^{p} \Big\{ \prod\_{j=1}^{p} t\_{jj}^{2(p-j)+1} \Big\} \mathrm{d}\tilde{T} \\ &= \frac{1}{\tilde{\Gamma}\_{p}(m)} 2^{p} \Big\{ \prod\_{j=1}^{p} (t\_{jj}^{2})^{m-j+\frac{1}{2}} \Big\} \mathrm{e}^{-\sum\_{j=1}^{p} t\_{jj}^{2} - \sum\_{i \ge j} |\tilde{t}\_{ij}|^{2}} \mathrm{d}\tilde{T}. \end{split} \tag{5.5a.6}$$

In light of (5.5*a*.6), it is clear that all the *tjj* 's and *t* ˜*ij* 's are mutually independently distributed where *t* ˜*ij , i > j,* has a complex standard Gaussian density and *t*<sup>2</sup> *jj* has a complex chisquare density with degrees of freedom *m* − *(j* − 1*)* or a real gamma density with the parameters *(α* = *m* − *(j* − 1*), β* = 1*)*, for *j* = 1*,...,p*. Hence, we have the following result:

**Theorem 5.5a.1.** *Let the complex Wishart density be as specified in (5.5a.4), that is, W*˜ ∼ *W*˜ *p(m, I ). Consider the transformation W*˜ = *T*˜ *T*˜ <sup>∗</sup> *where T*˜ = *(t* ˜*ij ) is a lower triangular matrix in the complex domain whose diagonal elements are real and positive. Then, for i > j, the t* ˜*ij 's are standard Gaussian distributed in the complex domain, that is, t* ˜*ij* <sup>∼</sup> *<sup>N</sup>*˜1*(*0*,* <sup>1</sup>*), i > j , <sup>t</sup>*<sup>2</sup> *jj is real gamma distributed with the parameters (α* = *m* − *(j* − 1*), β* = 1*) for j* = 1*,...,p, and all the tjj 's and t* ˜*ij 's, i > j, are mutually independently distributed.*

**Corollary 5.5a.1.** *Let <sup>W</sup>*˜ <sup>∼</sup> *<sup>W</sup>*˜ *p(m, σ*<sup>2</sup>*I ) where <sup>σ</sup>*<sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup> *is a real positive scalar. Let T, t* ˜ *jj , t* ˜*ij , i>j , be as defined in Theorem 5.5a.1. Then, t*<sup>2</sup> *jj /σ*<sup>2</sup> *is a real gamma variable with the parameters (α* = *m* − *(j* − 1*), β* = 1*) for j* = 1*,...,p, t* ˜*ij* <sup>∼</sup> *<sup>N</sup>*˜1*(*0*, σ*2*) for all i > j, and the tjj 's and t* ˜*ij 's are mutually independently distributed.*

#### **5.5.3. Samples from a** *p***-variate Gaussian population and the Wishart density**

Let the *p* × 1 real vector *Xj* be normally distributed, *Xj* ∼ *Np(μ, Σ), Σ > O*. Let *X*1*,...,Xn* be a simple random sample of size *n* from this normal population and the *p* × *n* sample matrix be denoted in bold face lettering as **X** = *(X*1*, X*2*,...,Xn)* where *X*- *<sup>j</sup>* <sup>=</sup> *(x*1*<sup>j</sup> , x*2*<sup>j</sup> ,...,xpj )*. Let the sample mean be *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn)* and the matrix of sample means be denoted by the bold face **X**¯ = *(X,... ,* ¯ *X)*¯ . Then, the *p* × *p* sample sum of products matrix *S* is given by

$$S = (\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})' = (s\_{ij}), \ s\_{ij} = \sum\_{k=1}^{n} (\mathbf{x}\_{ik} - \bar{\mathbf{x}}\_{i})(\mathbf{x}\_{jk} - \bar{\mathbf{x}}\_{j})'$$

where *<sup>x</sup>*¯*<sup>r</sup>* <sup>=</sup> *<sup>n</sup> <sup>k</sup>*=<sup>1</sup> *xrk/n, r* <sup>=</sup> <sup>1</sup>*,... , p,* are the averages on the components. It has already been shown in Sect. 3.5 for instance that the joint density of the sample values *X*1*,...,Xn*, denoted by *L*, can be written as

$$L = \frac{1}{(2\pi)^{\frac{np}{2}}|\Sigma|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}\mathbf{S}) - \frac{n}{2}(\bar{X} - \mu)'\Sigma^{-1}(\bar{X} - \mu)}.\tag{5.5.7}$$

But *(***X** − **X**¯ *)J* = *O, J* - = *(*1*,...,* 1*),* which implies that the columns of *(***X** − **X**¯ *)* are linearly related, and hence the elements in *(***X** − **X**¯ *)* are not distinct. In light of equation (4.5.17), one can write the sample sum of products matrix *S* in terms of a *p* × *(n* − 1*)* matrix *Zn*−<sup>1</sup> of distinct elements so that *S* = *Zn*−1*Z*- *<sup>n</sup>*−1. As well, according to Theorem 3.5.3 of Chap. 3, *S* and *X*¯ are independently distributed. The *p* × *n* matrix *Z* is obtained through the orthonormal transformation **X***P* = *Z, PP*- = *I, P*- *P* = *I* where *P* is *n* × *n*. Then d**X** = d*Z*, ignoring the sign. Let the last column of *P* be *pn*. We can specify *pn* to be <sup>√</sup> 1 *<sup>n</sup><sup>J</sup>* so that **<sup>X</sup>***pn* <sup>=</sup> <sup>√</sup>*n***X**¯ . Note that in light of (4.5.17), the deleted column in *<sup>Z</sup>* corresponds to <sup>√</sup>*n***X**¯ . The following considerations will be helpful to those who might need further confirmation of the validity of the above statement. Observe that **<sup>X</sup>** <sup>−</sup> **<sup>X</sup>**¯ <sup>=</sup> **<sup>X</sup>***(I* <sup>−</sup> *B),* with *<sup>B</sup>* <sup>=</sup> <sup>1</sup> *<sup>n</sup>J J* where *J* is a *n* × 1 vector of unities. Since *I* − *B* is idempotent and of rank *n* − 1, the eigenvalues are 1 repeated *n* − 1 times and a zero. An eigenvector, corresponding to the eigenvalue zero, is *J* normalized or √ 1 *<sup>n</sup>J* . Taking this as the last column *pn* of *<sup>P</sup>*, we have **<sup>X</sup>***pn* <sup>=</sup> <sup>√</sup>*n***X**¯ . Note that the other columns of *<sup>P</sup>*, namely *p*1*,...,pn*−1, correspond to the *n* − 1 orthonormal solutions coming from the equation *BY* = *Y* where *Y* is a *n*×1 non-null vector. Hence we can write d*Z* = d*Zn*−<sup>1</sup> ∧ d*X*¯ . Now, integrating out *X*¯ from (5.5.7), we have

$$L\,\mathrm{d}Z\_{n-1} = c\,\mathrm{e}^{-\frac{1}{2}\mathrm{tr}(\Sigma^{-1}Z\_{n-1}Z\_{n-1}')}\mathrm{d}Z\_{n-1},\,\,\mathrm{S} = Z\_{n-1}Z\_{n-1}',\,\tag{5.5.8}$$

where *c* is a constant. Since *Zn*−<sup>1</sup> contains *p(n* − 1*)* distinct real variables, we may apply Theorems 4.2.1, 4.2.2 and 4.2.3, and write d*Zn*−<sup>1</sup> in terms of d*S* as

$$\mathrm{d}Z\_{n-1} = \frac{\pi^{\frac{p(n-1)}{2}}}{\Gamma\_p(\frac{n-1}{2})} |S|^{\frac{n-1}{2} - \frac{p+1}{2}} \mathrm{d}S, \ n - 1 \ge p. \tag{5.5.9}$$

Then, if the density of *S* is denoted by *f (S)*,

$$f(S) \mathrm{d}S = c\_1 \frac{|S|^{\frac{n-1}{2} - \frac{p+1}{2}}}{\Gamma\_p(\frac{n-1}{2})} \mathrm{e}^{-\frac{1}{2} \mathrm{tr}(\Sigma^{-1} S)} \mathrm{d}S$$

where *c*<sup>1</sup> is a constant. From a real matrix-variate gamma density, we have the normalizing constant, thereby the value of *c*1. Hence

$$f(\mathcal{S})\mathcal{dS} = \frac{|\mathcal{S}|^{\frac{n-1}{2} - \frac{p+1}{2}}}{2^{\frac{(n-1)p}{2}}|\Sigma|^{\frac{n-1}{2}}\Gamma\_p(\frac{n-1}{2})} \mathbf{e}^{-\frac{1}{2}\operatorname{tr}(\Sigma^{-1}\mathcal{S})} \mathbf{d}\mathcal{S} \tag{5.5.10}$$

for *S > O, Σ > O, n* − 1 ≥ *p* and *f (X)* = 0 elsewhere, *Γp(*·*)* being the real matrixvariate gamma given by

$$\Gamma\_p(\alpha) = \pi^{\frac{p(p-1)}{4}} \Gamma(\alpha) \Gamma(\alpha - 1/2) \cdots \Gamma(\alpha - (p-1)/2), \; \Re(\alpha) > (p-1)/2.$$

Usually the sample size is taken as *N* so that *N* −1 = *n* the number of degrees of freedom associated with the Wishart density in (5.5.10). Since we have taken the sample size as *n*, the number of degrees of freedom is *n* − 1 and the parameter matrix is *Σ>O*. Then *S* in (5.5.10) is written as *S* ∼ *Wp(m, Σ)*, with *m* = *n* − 1 ≥ *p*. Thus, the following result:

**Theorem 5.5.2.** *Let X*1*,...,Xn be a simple random sample of size n from a Np(μ, Σ), Σ>O. Let Xj ,* **X***, X,* ¯ **X**¯ *, S be as defined in Sect. 5.5.3. Then, the density of S is a real Wishart density with m* = *n* − 1 *degrees of freedom and parameter matrix Σ>O, as given in (5.5.10).*

#### **5.5a.3. Sample from a complex Gaussian population and the Wishart density**

Let *X*˜ *<sup>j</sup>* ∼ *N*˜*p(μ, Σ), Σ > O, j* ˜ = 1*,...,n* be independently distributed. Let **X**˜ = *(X*˜ <sup>1</sup>*,..., X*˜ *n),* ¯ *<sup>X</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*˜ <sup>1</sup> +···+ *<sup>X</sup>*˜ *n),* ¯ **<sup>X</sup>**˜ <sup>=</sup> *(* ¯ *X, . . . ,* ˜ ¯ *X)*˜ and let *<sup>S</sup>*˜ <sup>=</sup> *(***X**˜ <sup>−</sup> ¯ **<sup>X</sup>**˜ *)(***X**˜ <sup>−</sup> ¯ **X**˜ *)*<sup>∗</sup> where a \* indicates the conjugate transpose. We have already shown in Sect. 3.5a that the joint density of *X*˜ <sup>1</sup>*,..., X*˜ *<sup>n</sup>*, denoted by *L*˜, can be written as

$$\tilde{L} = \frac{1}{\pi^{np} |\det(\Sigma)|^n} \mathbf{e}^{-\text{tr}(\Sigma^{-1}\tilde{S}) - n(\bar{\tilde{X}} - \tilde{\mu})^\* \Sigma^{-1} (\bar{\tilde{X}} - \tilde{\mu})}. \tag{5.5a.7}$$

Then, following steps parallel to (5.5.7) to (5.5.10), we obtain the density of *S*˜, denoted by *f (*˜ *S)*˜ , as the following:

$$\tilde{f}(\tilde{S})\mathrm{d}\tilde{S} = \frac{|\det(S)|^{m-p}}{|\det(\Sigma)|^m \tilde{\Gamma}\_p(m)} \mathrm{e}^{-\mathrm{tr}(\Sigma^{-1}S)} \,\mathrm{d}\tilde{S}, \, m = n - 1 \ge p,\tag{5.5a.8}$$

for *S > O, Σ > O, n* ˜ − 1 ≥ *p,* and *f (*˜ *S)*˜ = 0 elsewhere, where the complex matrixvariate gamma function being given by

$$\tilde{\Gamma}\_p(\alpha) = \pi^{\frac{p(p-1)}{2}} \Gamma(\alpha) \Gamma(\alpha - 1) \cdots \Gamma(\alpha - p + 1), \; \Re(\alpha) > p - 1.$$

Hence, we have the following result:

**Theorem 5.5a.2.** *Let X*˜ *<sup>j</sup>* ∼ *N*˜*p(μ, Σ), Σ > O, j* = 1*,...,n, be independently and identically distributed. Let* **X**˜ *,* ¯ *X,*˜ ¯ **X**˜ *, S*˜ *be as previously defined. Then, S*˜ *has a complex matrix-variate Wishart density with m* = *n* − 1 *degrees of freedom and parameter matrix Σ > O, as given in (5.5a.8).*

#### **5.5.4. Some properties of the Wishart distribution, real case**

If we have statistically independently distributed Wishart matrices with the same parameter matrix *Σ*, then it is easy to see that the sum is again a Wishart matrix. This can be noted by considering the Laplace transform of matrix-variate random variables discussed in Sect. 5.2. If *Sj* ∼ *Wp(mj , Σ), j* = 1*,...,k*, with the same parameter matrix *Σ>O* and the *Sj* 's are statistically independently distributed, then from equation (5.2.6), the Laplace transform of the density of *Sj* is

$$L\_{S\_j}(\ast T) = |I + 2\Sigma\_\ast T|^{-\frac{m\_j}{2}},\ I + 2\Sigma\_\ast T > O,\ j = 1,\ldots,k,\tag{5.5.11}$$

where <sup>∗</sup>*T* is a symmetric parameter matrix *T* = *(tij )* = *T* - *> O* with off-diagonal elements weighted by <sup>1</sup> <sup>2</sup> . When *Sj* 's are independently distributed, then the Laplace transform of the sum *S* = *S*<sup>1</sup> +···+ *Sk* is the product of the Laplace transforms:

$$\prod\_{j=1}^{k} |I + 2\,\Sigma\_\* T|^{-\frac{m\_j}{2}} = |I + 2\,\Sigma\_\* T|^{-\frac{1}{2}(m\_1 + \dots + m\_k)} \Rightarrow S \sim W\_p(m\_1 + \dots + m\_k, \,\Sigma). \tag{5.5.12}$$

Hence, the following result:

**Theorems 5.5.3, 5.5a.3.** *Let Sj* ∼ *Wp(mj , Σ), Σ > O, j* = 1*,... , k, be statistically independently distributed real Wishart matrices with m*1*,...,mk degrees of freedoms and the same parameter matrix Σ>O. Then the sum S* = *S*<sup>1</sup> +···+ *Sk is real Wishart distributed with degrees of freedom m*<sup>1</sup> +···+ *mk and the same parameter matrix Σ>O, that is, S* ∼ *Wp(m*<sup>1</sup> +··· + *mk, Σ), Σ > O. In the complex case, let S*˜ *<sup>j</sup>* ∼ *W*˜ *p(m, Σ),* ˜ *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O, j* = 1*,...,k, be independently distributed with the same Σ*˜ *. Then, the sum S*˜ = *S*˜ <sup>1</sup> +···+ *S*˜ *<sup>k</sup>* ∼ *W*˜ *p(m*<sup>1</sup> +···+ *mk, Σ)*˜ *.*

We now consider linear functions of independent Wishart matrices. Let *Sj* ∼ *Wp(mj , Σ), Σ > O, j* = 1*,...k*, be independently distributed and *Sa* = *a*1*S*<sup>1</sup> +···+ *akSk* where *a*1*,...,ak* are real scalar constants, then the Laplace transform of the density of *Sa* is

$$L\_{S\_a}(\*T) = \prod\_{j=1}^k |I + 2a\_j \Sigma\_\* T|^{-\frac{m\_j}{2}}, \ I + 2a\_j \Sigma\_\* T > O, \ j = 1, \ldots, k. \tag{i}$$

The inverse is quite complicated and the corresponding density cannot be easily determined; moreover, the density is not a Wishart density unless *a*<sup>1</sup> = ··· = *ak*. The types of complications occurring can be apprehended from the real scalar case *p* = 1 which is discussed in Mathai and Provost (1992). Instead of real scalars, we can also consider *p*×*p* constant matrices as coefficients, in which case the inversion of the Laplace transform will be more complicated. We can also consider Wishart matrices with different parameter matrices. Let *Uj* ∼ *Wp(mj , Σj ), Σj > O, j* = 1*,... , k,* be independently distributed and *U* = *U*<sup>1</sup> +···+*Uk*. Then, the Laplace transform of the density of *U*, denoted by *LU (*∗*T )*, is the following:

$$L\_U(\ast T) = \prod\_{j=1}^k |I + 2\Sigma\_{j\ast}T|^{-\frac{m\_j}{2}}, I + 2\Sigma\_{j\ast}T > O, \ j = 1, \ldots, k. \tag{ii}$$

This case does not yield a Wishart density as an inverse Laplace transform either, unless *Σ*<sup>1</sup> = ··· = *Σk*. In both *(i)* and *(ii),* we have linear functions of independent Wishart matrices; however, these linear functions do not have Wishart distributions.

Let us consider a symmetric transformation on a Wishart matrix *S*. Let *S* ∼ *Wp(m, Σ), Σ > O* and *U* = *ASA* where *A* is a *p* × *p* nonsingular constant matrix. Let us take the Laplace transform of the density of *U*:

Matrix-Variate Gamma and Beta Distributions

$$\begin{split} L\_{U}(\ast T) &= E[\mathbf{e}^{-\text{tr}(\ast T^{\prime}U)}] = E[\mathbf{e}^{-\text{tr}(\ast T^{\prime}ASA^{\prime})}] = E[\mathbf{e}^{-\text{tr}(A^{\prime}\ast TAS)}] \\ &= E[\mathbf{e}^{-\text{tr}[(A^{\prime}\ast TA^{\prime})^{S}]}] = L\_{S}(A^{\prime}\ast TA) = |I + 2\Sigma(A^{\prime}\ast TA)^{-\frac{m}{2}}| \\ &= |I + 2(A\Sigma A^{\prime})\_{\ast}T|^{-\frac{m}{2}} \\ &\Rightarrow U \sim W\_{p}(m, A\Sigma A^{\prime}), \ \Sigma > O, \ |A| \neq 0. \end{split}$$

Hence we have the following result:

**Theorems 5.5.4, 5.5a.4.** *Let S* ∼ *Wp(m, Σ > O) and U* = *ASA*- *,* |*A*| = 0*. Then, U* ∼ *Wp(m, AΣA*- *), Σ > O,* |*A*| = 0*, that is, when U* = *ASA where A is a nonsingular p* × *p constant matrix, then U is Wishart distributed with degrees of freedom m and parameter matrix AΣA*- *. In the complex case, the constant p* × *p nonsingular matrix A can be real or in the complex domain. Let S*˜ ∼ *W*˜ *p(m, Σ),* ˜ *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O. Then U*˜ = *ASA*˜ <sup>∗</sup> ∼ *W*˜ *p(m, AΣA*˜ <sup>∗</sup>*).*

If *A* is not a nonsingular matrix, is there a corresponding result? Let *B* be a constant *q*×*p* matrix, *q* ≤ *p*, which is of full rank *q*. Let *Xj* ∼ *Np(μ, Σ), Σ > O, j* = 1*,...,n*, be iid so that we have a simple random sample of size *n* from a real *p*-variate Gaussian population. Let the *q* × 1 vectors *Yj* = *BXj , j* = 1*, . . . , n,* be iid. Then *E*[*Yj* ] = *Bμ,* Cov*(Yj )* = *E*[*(Yj* −*E(Yj ))(Yj* −*E(Yj ))*- ] = *BE*[*(Xj* −*E(Xj ))(Xj* −*E(Xj ))*- *B*- = *BΣB* which is *q* × *q*. As well, *Yj* ∼ *Nq (Bμ, BΣB*- *), BΣB*- *> O*. Consider the sample matrix formed from the *Yj* 's, namely the *q* × *n* matrix **Y** = *(Y*1*,...,Yn)* = *(BX*1*,... ,BXn)* = *B(X*1*,...,Xn)* = *B***X** where **X** is the *p* × *n* sample matrix from *Xj* . Then, the sample sum of products matrix in **Y** is *(***Y** − **Y**¯ *)(***Y** − **Y**¯ *)*- = *Sy*, say, where the usual notation is utilized, namely, *<sup>Y</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(Y*<sup>1</sup> +···+ *Yn)* and **Y**¯ = *(Y,...,* ¯ *Y )*¯ . Now, the problem is equivalent to taking a simple random sample of size *n* from a *q*-variate real Gaussian population with mean value vector *Bμ* and positive definite covariance matrix *BΣB*-*> O*. Hence, the following result:

**Theorems 5.5.5, 5.5a.5.** *Let Xj* ∼ *Np(μ, Σ), Σ > O, j* = 1*,...,n, be iid, and S be the sample sum of products matrix in this p-variate real Gaussian population. Let B be a q* × *p constant matrix, q* ≤ *p, which has full rank q. Then BSB is real Wishart distributed with degrees of freedom m* = *n* − 1*, n being equal to the sample size, and parameter matrix BΣB*- *> O, that is, BSB* ∼ *Wq (m, BΣB*- *). Similarly, in the complex case, let B be a q* × *p, q* ≤ *p, constant matrix of full rank q, where B may be in the real or complex domain. Then, BSB*˜ <sup>∗</sup> *is Wishart distributed with degrees of freedom m and parameter matrix BΣB*˜ <sup>∗</sup>*, that is, BSB*˜ <sup>∗</sup> ∼ *W*˜ *<sup>q</sup> (m, BΣB*˜ <sup>∗</sup>*).*

#### **5.5.5. The generalized variance**

Let *Xj , X*- *<sup>j</sup>* = *(x*1*<sup>j</sup> ,...,xpj ),* be a real *p*×1 vector random variable for *j* = 1*, . . . , n,* and the *Xj* 's be iid (independently and identically distributed) as *Xj* . Let the covariance matrix associated with *Xj* be Cov*(Xj )* = *E*[*(Xj* −*E(Xj ))(Xj* −*E(Xj ))*- ] = *Σ, Σ* ≥ *O*, for *j* = 1*,...,n* in the real case and *Σ*˜ = *E*[*(X*˜ *<sup>j</sup>* − *E(X*˜ *<sup>j</sup> ))(X*˜ *<sup>j</sup>* − *E(X*˜ *<sup>j</sup> ))*∗] in the complex case, where an asterisk indicates the conjugate transpose. Then, the diagonal elements in *Σ* represent the squares of a measure of scatter or variances associated with the elements *x*1*<sup>j</sup> ,...,xpj* and the off-diagonal elements in *Σ* provide the corresponding measure of joint dispersion or joint scatter in the pair *(xrj , xsj )* for all *r* = *s*. Thus, *Σ* gives a configuration of individual and joint squared scatter in all the elements *x*1*<sup>j</sup> ,...,xpj* . If we wish to have a single number or single scalar quantity representing this configuration of individual and joint scatter in the elements *x*1*<sup>j</sup> ,...,xpj* what should be that measure? Wilks had taken the determinant of *Σ*, |*Σ*|, as that measure and called it the *generalized variance* or square of the scatter representing the whole configuration of scatter in all the elements *x*1*<sup>j</sup> ,...,xpj* . If there is no scatter in one or in a few elements but there is scatter or dispersion in all other elements, then the determinant is zero. If the matrix is singular then the determinant is zero, but this does not mean that there is no scatter in these elements. Thus, determinant as a measure of scatter or dispersion, violates a very basic condition that if the proposed measure is zero then there should not be any scatter in any of the elements or *Σ* should be a null matrix. Hence, the first author suggested to take a norm of *Σ*, *Σ*, as a single measure of scatter in the whole configuration, such as *Σ*<sup>1</sup> = max*<sup>i</sup> <sup>j</sup>* |*σij* | or *Σ*<sup>2</sup> = largest eigenvalue of *Σ* since *Σ* is at least positive semi-definite. Note that normality is not assumed in the above discussion.

If *S* ∼ *Wp(m, Σ), Σ > O,* what is then the distribution of Wilks' generalized variance in *S*, namely |*S*|, which can be referred to as the sample generalized variance? Let us determine the *h*-th moment of the sample generalized variance |*S*| for an arbitrary *h*. This has already been discussed for real and complex matrix-variate gamma distributions in Sect. 5.4.1 and can be obtained from the normalizing constant in the Wishart density:

$$\begin{split} E[|S|^{h}] &= \frac{\int\_{X>O} |S|^{\frac{m}{2}+h-\frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\Sigma^{-1}S)}}{2^{\frac{mp}{2}} \Gamma\_p(\frac{m}{2}) |\Sigma|^{\frac{m}{2}}} \text{d}S \\ &= 2^{ph} |\Sigma|^{h} \frac{\Gamma\_p(\frac{m}{2}+h)}{\Gamma\_p(\frac{m}{2})}, \; \Re(\frac{m}{2}+h) > \frac{p-1}{2}. \end{split} \tag{5.5.13}$$

Then

$$E[|(2\boldsymbol{\Sigma})^{-1}\boldsymbol{S}|^{h}] = \frac{\Gamma\_{p}(\frac{m}{2} + h)}{\Gamma\_{p}(\frac{m}{2})} = \prod\_{j=1}^{p} \frac{\Gamma(\frac{m}{2} + h - \frac{j-1}{2})}{\Gamma(\frac{m}{2} - \frac{j-1}{2})}$$

$$= E[\mathbf{y}\_{1}^{h}]E[\mathbf{y}\_{2}^{h}]\cdots E[\mathbf{y}\_{p}^{h}] \tag{5.5.14}$$

where *y*1*,* ··· *, yp* are independently distributed real scalar gamma random variables with the parameters *(<sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *,* 1*), j* = 1*,...,p*. In the complex case

Matrix-Variate Gamma and Beta Distributions

$$\begin{split} E\left[ \det((\tilde{\Sigma})^{-1}\tilde{S}) \right]^h &= \frac{\tilde{\Gamma}\_p(m+h)}{\tilde{\Gamma}\_p(m)} = \prod\_{j=1}^p \frac{\tilde{\Gamma}(m - (j-1) + h)}{\tilde{\Gamma}(m - (j-1))} \\ &= E[\tilde{\mathbf{y}}\_1^h] \cdots E[\tilde{\mathbf{y}}\_p^h] \end{split} \tag{5.5a.9}$$

where *y*˜1*,..., y*˜*<sup>p</sup>* and independently distributed real scalar gamma random variables with the parameters *(m* <sup>−</sup> *(j* <sup>−</sup> <sup>1</sup>*),* <sup>1</sup>*), j* <sup>=</sup> <sup>1</sup>*,...,p*. Note that if we consider *<sup>E</sup>*[|*Σ*−1*S*<sup>|</sup> *h*] instead of *<sup>E</sup>*[|*(*2*Σ)*−1*S*<sup>|</sup> *<sup>h</sup>*] in (5.5.14), then the *yj* 's are independently distributed as real chisquare random variables having *m*−*(j* −1*)* degrees of freedom for *j* = 1*,...,p*. This can be stated as a result.

**Theorems 5.5.6, 5.5a.6.** *Let S* ∼ *Wp(m, Σ), Σ > O, and* |*S*| *be the generalized variance associated with this Wishart matrix or the sample generalized variance in the corresponding <sup>p</sup>-variate real Gaussian population. Then, <sup>E</sup>*[|*(*2*Σ)*−1*S*<sup>|</sup> *<sup>h</sup>*] = *<sup>E</sup>*[*y<sup>h</sup>* <sup>1</sup> ]··· *<sup>E</sup>*[*y<sup>h</sup> p*] *so that* <sup>|</sup>*(*2*Σ)*−1*S*<sup>|</sup> *has the structural representation* <sup>|</sup>*(*2*Σ)*−1*S*| = *<sup>y</sup>*<sup>1</sup> ··· *yp where the yj 's are independently distributed real gamma random variables with the parameters (m* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *,* <sup>1</sup>*), j* <sup>=</sup> <sup>1</sup>*,...,p. Equivalently, <sup>E</sup>*[|*Σ*−1*S*<sup>|</sup> *<sup>h</sup>*] = *<sup>E</sup>*[*z<sup>h</sup>* <sup>1</sup>]··· *E*[*zp*] *<sup>h</sup>*] *where the zj 's are independently distributed real chisquare random variables having m* − *(j* − 1*), j* = 1*,... , p, degrees of freedom. In the complex case, if we let S*˜ ∼ *W*˜ *p(m, Σ),* ˜ *Σ*˜ = *<sup>Σ</sup>*˜ <sup>∗</sup> *> O, and* <sup>|</sup>*det(S)*˜ <sup>|</sup> *be the generalized variance, then* <sup>|</sup> *det((Σ)*˜ <sup>−</sup>1*S)*˜ <sup>|</sup> *has the structural representation* <sup>|</sup>*det((Σ)*˜ <sup>−</sup>1*S)*˜ |= ˜*y*<sup>1</sup> ··· ˜*yp where the <sup>y</sup>*˜*<sup>j</sup> 's are independently distributed real scalar gamma random variables with the parameters (m*−*(j* −1*),* 1*), j* = 1*,...,p or chisquare random variables in the complex domain having m*−*(j* −1*), j* = 1*,... , p, degrees of freedom.*

#### **5.5.6. Inverse Wishart distribution**

When *<sup>S</sup>* <sup>∼</sup> *Wp(m, Σ), Σ > O*, what is then the distribution of *<sup>S</sup>*−1? Since *<sup>S</sup>* has a real matrix-variate gamma distribution, that of its inverse is directly available from the transformation *<sup>U</sup>* <sup>=</sup> *<sup>S</sup>*−1. In light of Theorem 1.6.6, we have d*<sup>S</sup>* = |*U*<sup>|</sup> −*(p*+1*)* d*U* for the real case and d*X*˜ = |det*(U*˜ *U*˜ <sup>∗</sup>*)*| <sup>−</sup>*p*d*U*˜ in the complex domain. Thus, denoting the density of *U* by *g(U )*, we have the following result:

**Theorems 5.5.7, 5.5a.7.** *Let the real Wishart matrix S* ∼ *Wp(m, Σ), Σ > O, and the Wishart matrix in the complex domain <sup>S</sup>*˜ <sup>∼</sup> *<sup>W</sup>*˜ *p(m, Σ),* ˜ *<sup>Σ</sup>*˜ <sup>=</sup> *<sup>Σ</sup>*˜ <sup>∗</sup> *> O. Let <sup>U</sup>* <sup>=</sup> *<sup>S</sup>*−<sup>1</sup> *and <sup>U</sup>*˜ <sup>=</sup> *<sup>S</sup>*˜−1*. Letting the density of <sup>S</sup> be denoted by g(U ) and that of <sup>U</sup>*˜ *be denoted by g(*˜ *U )*˜ *,*

$$g(U) = \frac{|U|^{-\frac{m}{2} - \frac{p+1}{2}}}{2^{\frac{mp}{2}} \Gamma\_p(\frac{m}{2}) |\Sigma|^{\frac{m}{2}}} e^{-\frac{1}{2}tr(\Sigma^{-1}U^{-1})}, \ U > O, \ \Sigma > O,\tag{5.5.15}$$

*and zero elsewhere, and*

$$\tilde{\mathbf{g}}(\tilde{U}) = \frac{|\det(\tilde{U})|^{-m-p}}{\tilde{\Gamma}\_p(m)|\det(\tilde{\Sigma})|^m} e^{-tr(\tilde{\Sigma}^{-1}\tilde{U}^{-1})}, \ \tilde{U} = \tilde{U}^\* > O, \ \tilde{\Sigma} = \tilde{\Sigma}^\* > O,\tag{5.5a.10}$$

*and zero elsewhere.*

#### **5.5.7. Marginal distributions of a Wishart matrix**

At the beginning of this chapter, we had explicitly evaluated real and complex matrixvariate gamma integrals and determined that the diagonal blocks are again real and complex matrix-variate gamma integrals. Hence, the following results are already available from the discussion on the matrix-variate gamma distribution. We will now establish the results via Laplace transforms. Let *S* be Wishart distributed with degrees of freedom *m* and parameter matrix *Σ>O*, that is, *S* ∼ *Wp(m, Σ), Σ > O, m* ≥ *p*. Let us partition *S* and *Σ* as follows:

$$S = \begin{bmatrix} S\_{11} & S\_{12} \\ S\_{21} & S\_{22} \end{bmatrix} \text{ and } \begin{array}{l} \Sigma = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix},\tag{i}$$

(referred to as a 2×2 partitioning) *S*11*, Σ*<sup>11</sup> being *r*×*r* and *S*22*, Σ*<sup>22</sup> being *(p*−*r)*×*(p*−*r)* – refer to Sect. 1.3 for results on partitioned matrices. Let <sup>∗</sup>*T* be a similarly partitioned *p* × *p* parameter matrix with <sup>∗</sup>*T* <sup>11</sup> being *r* × *r* where

$$\,\_{\*}T = \begin{bmatrix} \,\_{\*}T\_{\text{lll}} & O \\ O & O \end{bmatrix}, \,\_{\*}T\_{\text{lll}} = \,\_{\*}T'\_{\text{lll}} > O. \tag{ii}$$

Observe that <sup>∗</sup>*T* is a slightly modified parameter matrix *T* = *(tij )* = *T* where the *tij* 's are weighted with <sup>1</sup> <sup>2</sup> for *i* = *j* to obtain <sup>∗</sup>*T* . Noting that tr*(*∗*T* - *S)* = tr*(*∗*T* - <sup>11</sup>*S*11*)*, the Laplace transform of the Wishart density *Wp(m, Σ), Σ > O*, with <sup>∗</sup>*T* as defined above, is given by

$$|I + 2\boldsymbol{\Sigma}\_{\*}\boldsymbol{T}|^{-\frac{m}{2}} = \begin{vmatrix} I\_r + 2\boldsymbol{\Sigma}\_{11\*}\boldsymbol{T}\_{11} & O \\ 2\boldsymbol{\Sigma}\_{21\*}\boldsymbol{T}\_{11} & I\_{p-r} \end{vmatrix}^{-\frac{m}{2}} = |I\_r + 2\boldsymbol{\Sigma}\_{11\*}\boldsymbol{T}\_{11}|^{-\frac{m}{2}}.\tag{5.5.16}$$

Thus, *S*<sup>11</sup> has a Wishart distribution with *m* degrees of freedom and parameter matrix *Σ*11. It can be similarly established that *S*<sup>22</sup> is Wishart distributed with degrees of freedom *m* and parameter matrix *Σ*22. Hence, the following result:

**Theorems 5.5.8, 5.5a.8.** *Let S* ∼ *Wp(m, Σ), Σ > O. Let S and Σ be partitioned into a* 2 × 2 *partitioning as above. Then, the sub-matrices S*<sup>11</sup> ∼ *Wr(m, Σ*11*), Σ*<sup>11</sup> *> O, and S*<sup>22</sup> ∼ *Wp*−*r(m, Σ*22*), Σ*<sup>22</sup> *> O. In the complex case, let S*˜ ∼ *W*˜ *p(m, Σ),* ˜ *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O. Letting S*˜ *be partitioned as in the real case, S*˜ <sup>11</sup> ∼ *W*˜ *r(m, Σ*˜ <sup>11</sup>*) and S*˜ 22 ∼ *W*˜ *<sup>p</sup>*−*r(m, Σ*˜ <sup>22</sup>*).*

**Corollaries 5.5.2, 5.5a.2.** *Let S* ∼ *Wp(m, Σ), Σ > O. Suppose that Σ*<sup>12</sup> = *O in the* 2 × 2 *partitioning of Σ. Then S*<sup>11</sup> *and S*<sup>22</sup> *are independently distributed with S*<sup>11</sup> ∼ *Wr(m, Σ*11*) and S*<sup>22</sup> ∼ *Wp*−*r(m, Σ*22*). Consider a k* × *k partitioning of S and Σ, the order of the diagonal blocks Sjj and Σjj being pj* × *pj , p*<sup>1</sup> +···+ *pk* = *p. If Σij* = *O for all i* = *j, then the Sjj 's are independently distributed as Wishart matrices on pj components, with degrees of freedom m and parameter matrices Σjj > O, j* = 1*,...,k. In the complex case, consider the same type of partitioning as in the real case. Then, if Σ*˜*ij* = *O for all i* = *j , S*˜ *jj , j* = 1*,... , k, are independently distributed as S*˜ *jj* ∼ *W*˜ *pj (m, Σ*˜ *jj ), j* = 1*,... , k, p*<sup>1</sup> +···+ *pk* = *p.*

Let *S* be a *p* × *p* real Wishart matrix with *m* degrees of freedom and parameter matrix *Σ>O*. Consider the following 2 <sup>×</sup> 2 partitioning of *<sup>S</sup>* and *<sup>Σ</sup>*−1:

$$S = \begin{bmatrix} S\_{11} & S\_{12} \\ S\_{21} & S\_{22} \end{bmatrix}, \ S\_{11} \text{ being } r \times r, \ \Sigma^{-1} = \begin{bmatrix} \Sigma^{11} & \Sigma^{12} \\ \Sigma^{21} & \Sigma^{22} \end{bmatrix}.$$

Then, the density, denoted by *f (S)*, can be written as

$$\begin{split} f(S) &= \frac{|S|^{\frac{m}{2}-\frac{p+1}{2}}}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})|\Sigma|^{\frac{m}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}S)} \\ &= \frac{|S\_{11}|^{\frac{m}{2}-\frac{p+1}{2}}|S\_{22}-S\_{21}S\_{11}^{-1}S\_{12}|^{\frac{m}{2}-\frac{p+1}{2}}}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})|\Sigma|^{\frac{m}{2}}} \\ &\times \mathbf{e}^{-\frac{1}{2}[\text{tr}(\Sigma^{11}S\_{11})+\text{tr}(\Sigma^{22}S\_{22})+\text{tr}(\Sigma^{12}S\_{21})+\text{tr}(\Sigma^{21}S\_{12})]}. \end{split}$$

In this case, d*<sup>S</sup>* <sup>=</sup> <sup>d</sup>*S*<sup>11</sup> <sup>∧</sup> <sup>d</sup>*S*<sup>22</sup> <sup>∧</sup> <sup>d</sup>*S*12. Let *<sup>U</sup>*<sup>2</sup> <sup>=</sup> *<sup>S</sup>*<sup>22</sup> <sup>−</sup> *<sup>S</sup>*21*S*−<sup>1</sup> <sup>11</sup> *S*12. Referring to Sect. 1.3, the coefficient of *<sup>S</sup>*<sup>11</sup> in the exponent is *<sup>Σ</sup>*<sup>11</sup> <sup>=</sup> *(Σ*<sup>11</sup> <sup>−</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*)*−1. Let *<sup>U</sup>*<sup>2</sup> <sup>=</sup> *<sup>S</sup>*<sup>22</sup> <sup>−</sup> *S*21*S*−<sup>1</sup> <sup>11</sup> *<sup>S</sup>*<sup>12</sup> so that *<sup>S</sup>*<sup>22</sup> <sup>=</sup> *<sup>U</sup>*<sup>2</sup> <sup>+</sup> *<sup>S</sup>*21*S*−<sup>1</sup> <sup>11</sup> *S*<sup>12</sup> and d*S*<sup>22</sup> = d*U*<sup>2</sup> for fixed *S*<sup>11</sup> and *S*12. Then, the function of *U*<sup>2</sup> is of the form

$$|U\_2|^{\frac{m}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\boldsymbol{\Sigma}^{22} U\_2)}.$$

However, *<sup>U</sup>*<sup>2</sup> is *(p*−*r)*×*(p*−*r)* and we can write *<sup>m</sup>* <sup>2</sup> <sup>−</sup>*p*+<sup>1</sup> <sup>2</sup> <sup>=</sup> *<sup>m</sup>*−*<sup>r</sup>* <sup>2</sup> <sup>−</sup>*p*−*r*+<sup>1</sup> <sup>2</sup> . Therefore *U*<sup>2</sup> ∼ *Wp*−*r(m* <sup>−</sup> *r, Σ*<sup>22</sup> <sup>−</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*)* as *<sup>Σ</sup>*<sup>22</sup> <sup>=</sup> *(Σ*<sup>22</sup> <sup>−</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*)*−1. From symmetry, *<sup>U</sup>*<sup>1</sup> <sup>=</sup> *<sup>S</sup>*11−*S*12*S*−<sup>1</sup> <sup>22</sup> *<sup>S</sup>*<sup>21</sup> <sup>∼</sup> *Wr(m*−*(p*−*r), Σ*11−*Σ*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*21*)*. After replacing *S*<sup>22</sup> in the exponent by *<sup>U</sup>*<sup>2</sup> <sup>+</sup> *<sup>S</sup>*21*S*−<sup>1</sup> <sup>11</sup> *<sup>S</sup>*12, the exponent, excluding <sup>−</sup><sup>1</sup> <sup>2</sup> , can be written as tr[*Σ*11*S*11] + tr[*Σ*22*S*12*S*−<sup>1</sup> <sup>22</sup> *<sup>S</sup>*21] + tr[*Σ*12*S*21] + tr[*Σ*21*S*12]. Let us try to integrate out *<sup>S</sup>*12. To this end, let *V* = *S* −1 2 <sup>11</sup> *S*<sup>12</sup> ⇒ d*S*<sup>12</sup> = |*S*11| *p*−*r* <sup>2</sup> d*V* for fixed *S*11. Then the determinant of *S*<sup>11</sup> in *f (X)* becomes |*S*11| *m* <sup>2</sup> <sup>−</sup>*p*+<sup>1</sup> <sup>2</sup> × |*S*11| *p*−*r* <sup>2</sup> = |*S*11| *m* <sup>2</sup> <sup>−</sup>*r*+<sup>1</sup> <sup>2</sup> . The exponent, excluding <sup>−</sup><sup>1</sup> <sup>2</sup> becomes the following, denoting it by *ρ*:

$$\rho = \text{tr}(\boldsymbol{\Sigma}^{12} \boldsymbol{V}^{\prime} \boldsymbol{S}\_{\text{l}\text{l}}^{\frac{1}{2}}) + \text{tr}(\boldsymbol{\Sigma}^{21} \boldsymbol{S}\_{\text{l}\text{l}}^{\frac{1}{2}} \boldsymbol{V}) + \text{tr}(\boldsymbol{\Sigma}^{22} \boldsymbol{V}^{\prime} \boldsymbol{V}).\tag{i}$$

Note that tr*(Σ*22*V* - *V )* <sup>=</sup> tr*(V Σ*22*<sup>V</sup>* - *)* and

$$(V+C)\Sigma^{22}(V+C)' = V\Sigma^{22}V' + V\Sigma^{22}C' + C\Sigma^{22}V + C\Sigma^{22}C'.\tag{ii}$$

On comparing *(i)* and *(ii),* we have *C*- <sup>=</sup> *(Σ*22*)*−1*Σ*21*<sup>S</sup>* 1 2 <sup>11</sup>. Substituting for *C* and *C* in *ρ ,* the term containing *<sup>S</sup>*<sup>11</sup> in the exponent becomes <sup>−</sup><sup>1</sup> <sup>2</sup> tr*(S*11*(Σ*<sup>11</sup> <sup>−</sup>*Σ*12*(Σ*22*)*−1*Σ*21*)* <sup>=</sup> −1 <sup>2</sup> tr*(S*11*Σ*−<sup>1</sup> <sup>11</sup> *)*. Collecting the factors containing *S*11*,* we have *S*<sup>11</sup> ∼ *Wr(m, Σ*11*)* and from symmetry, *S*<sup>22</sup> ∼ *Wp*−*r(m, Σ*22*)*. Since the density *f (S)* splits into a function of *U*2, *S*<sup>11</sup> and *S* −1 2 <sup>11</sup> *S*12, these quantities are independently distributed. Similarly, *U*1*, S*<sup>22</sup> and *S* −1 2 <sup>22</sup> *<sup>S</sup>*<sup>21</sup> are independently distributed. The exponent of <sup>|</sup>*U*1<sup>|</sup> is *<sup>m</sup>* <sup>2</sup> <sup>−</sup>*p*+<sup>1</sup> <sup>2</sup> <sup>=</sup> *(<sup>m</sup>* <sup>2</sup> <sup>−</sup>*p*−*<sup>r</sup>* <sup>2</sup> *)*−*r*+<sup>1</sup> 2 . Observing that *<sup>U</sup>*<sup>1</sup> is *<sup>r</sup>* <sup>×</sup>*r*, we have the density of *<sup>U</sup>*<sup>1</sup> <sup>=</sup> *<sup>S</sup>*<sup>11</sup> <sup>−</sup>*S*12*S*−<sup>1</sup> <sup>22</sup> *S*<sup>21</sup> as a real Wishart density on *r* components, with degrees of freedom *m* − *(p* − *r)* and parameter matrix *<sup>Σ</sup>*<sup>11</sup> <sup>−</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*<sup>21</sup> whose the density, denoted by *f*1*(U ),* is the following:

$$f\_1(U\_1) = \frac{|U\_1|^{\frac{m-(p-r)}{2} - \frac{r+1}{2}}}{2^{\frac{r(m-(p-r))}{2}} \Gamma\_r(\frac{m-(p-r)}{2}) |\Sigma\_{11} - \Sigma\_{12} \Sigma\_{22}^{-1} \Sigma\_{21}|^{\frac{m-(p-r)}{2}}} \mathrm{e}^{-\frac{1}{2} \text{tr}[U\_1(\Sigma\_{11} - \Sigma\_{12} \Sigma\_{22}^{-1} \Sigma\_{21})^{-1}]} \,\,\,\,\,\tag{iiii}$$

A similar expression can be obtained for the density of *U*2. Thus, the following result:

**Theorems 5.5.9, 5.5a.9.** *Let S* ∼ *Wp(m, Σ), Σ > O, m* ≥ *p. Consider the* 2 × 2 *partitioning of <sup>S</sup> as specified above, <sup>S</sup>*<sup>11</sup> *being <sup>r</sup>* <sup>×</sup> *<sup>r</sup>. Let <sup>U</sup>*<sup>1</sup> <sup>=</sup> *<sup>S</sup>*<sup>11</sup> <sup>−</sup> *<sup>S</sup>*12*S*−<sup>1</sup> <sup>22</sup> *S*21*. Then,*

$$U\_1 \sim W\_r(m - (p - r), \ \Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}).\tag{5.5.17}$$

*In the complex case, let S*˜ ∼ *W*˜ *p(m, Σ),* ˜ *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O. Consider the same partitioning as in the real case and let S*˜ <sup>11</sup> *be r* ×*r. Then, letting U*˜<sup>1</sup> = *S*˜ <sup>11</sup> −*S*˜ 12*S*˜−<sup>1</sup> <sup>22</sup> *S*˜ <sup>21</sup>*, U*˜<sup>1</sup> *is Wishart distributed as*

$$
\tilde{U}\_1 \sim \tilde{W}\_r(m - (p - r), \,\,\tilde{\Sigma}\_{11} - \tilde{\Sigma}\_{12}\tilde{\Sigma}\_{22}^{-1}\tilde{\Sigma}\_{21}).\tag{5.5a.11}
$$

*A similar density is obtained for U*˜<sup>2</sup> = *S*˜ <sup>22</sup> − *S*˜ 21*S*˜−<sup>1</sup> <sup>11</sup> *S*˜ 12*.* **Example 5.5.1.** Let the 3 × 3 matrix *S* ∼ *W*3*(*5*, Σ), Σ > O*. Determine the distributions of *<sup>Y</sup>*<sup>1</sup> <sup>=</sup> *<sup>S</sup>*<sup>22</sup> <sup>−</sup> *<sup>S</sup>*21*S*−<sup>1</sup> <sup>11</sup> *S*12*, Y*<sup>2</sup> = *S*<sup>22</sup> and *Y*<sup>3</sup> = *S*<sup>11</sup> where

$$
\Sigma = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 3 & 1 \\ 0 & 1 & 3 \end{bmatrix} = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \ \Sigma\_{11} = \begin{bmatrix} 2 & -1 \\ -1 & 3 \end{bmatrix}, \ S = \begin{bmatrix} S\_{11} & S\_{12} \\ S\_{21} & S\_{22} \end{bmatrix}
$$

with *S*<sup>11</sup> being 2 × 2.

**Solution 5.5.1.** Let the densities of *Yj* be denoted by *fj (Yj ), j* = 1*,* 2*,* 3. We need the following matrix, denoted by *B*:

$$\begin{aligned} B &= \Sigma\_{22} - \Sigma\_{21} \Sigma\_{11}^{-1} \Sigma\_{12} = 3 - [0, 1] \begin{bmatrix} 1 \\ \frac{7}{5} \end{bmatrix} \begin{bmatrix} 3 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} \\ &= 3 - \frac{2}{5} = \frac{13}{5} . \end{aligned}$$

From our usual notations, *Y*<sup>1</sup> ∼ *Wp*−*r(m* − *r, B)*. Observing that *Y*<sup>1</sup> is a real scalar, we denote it by *y*1*,* its density being given by

$$\begin{split} f\_{1}(\mathbf{y}\_{1}) &= \frac{\mathbf{y}\_{1}^{\frac{m-r}{2} - \frac{(p-r)+1}{2}}}{2^{\frac{(m-r)(p-r)}{2}} |\mathcal{B}|^{\frac{m-r}{2}} \Gamma(\frac{m-r}{2})} \mathbf{e}^{-\frac{1}{2} \operatorname{tr}(\mathcal{B}^{-1} \mathbf{y}\_{1})} \\ &= \frac{\mathbf{y}\_{1}^{\frac{3}{2} - 1} \mathbf{e}^{-\frac{5}{25} \mathbf{y}\_{1}}}{2^{\frac{3}{2}} (13/5)^{\frac{3}{2}} \Gamma(\frac{3}{2})}, \ 0 \leq \mathbf{y}\_{1} < \infty, \end{split}$$

and zero elsewhere. Now, consider *Y*<sup>2</sup> which is also a real scalar that will be denoted by *y*2. As per our notation, *Y*<sup>2</sup> = *S*<sup>22</sup> ∼ *Wp*−*r(m, Σ*22*)*. Its density is then as follows, observing that *<sup>Σ</sup>*<sup>22</sup> <sup>=</sup> *(*3*),* <sup>|</sup>*Σ*22| = 3 and *<sup>Σ</sup>*−<sup>1</sup> <sup>22</sup> <sup>=</sup> *(* <sup>1</sup> 3 *)*:

$$\begin{split} f\_2(\mathbf{y}\_2) &= \frac{\mathbf{y}\_2^{\frac{m}{2} - \frac{(p-r)+1}{2}} \mathbf{e}^{-\frac{1}{2}} \operatorname{tr}(\Sigma\_{22}^{-1} \mathbf{y}\_2)}{2^{\frac{m(p-r)}{2}} \Gamma\_{p-r}(\frac{m}{2}) |\Sigma\_{22}|^{\frac{m}{2}}} \\ &= \frac{\mathbf{y}\_2^{\frac{5}{2} - 1} \mathbf{e}^{-\frac{1}{6} \mathbf{y}\_2}}{2^{\frac{5}{2}} \Gamma(\frac{5}{2}) 3^{\frac{5}{2}}}, \ 0 \leq \mathbf{y}\_2 < \infty, \end{split}$$

and zero elsewhere. Note that *Y*<sup>3</sup> = *S*<sup>11</sup> is 2 × 2. With our usual notations, *p* = 3*, r* = 2*, m* = 5 and |*Σ*11| = 5; as well,

$$S\_{11} = \begin{bmatrix} s\_{11} & s\_{12} \\ s\_{12} & s\_{22} \end{bmatrix}, \ \Sigma\_{11}^{-1} = \frac{1}{5} \begin{bmatrix} 3 & 1 \\ 1 & 2 \end{bmatrix}, \ \text{tr}(\Sigma\_{11}^{-1} S\_{11}) = \frac{1}{5} [3s\_{11} + 2s\_{12} + 2s\_{22}].$$

Thus, the density of *Y*<sup>3</sup> is

$$\begin{split} f\_{\mathfrak{I}}(Y\_{\mathfrak{I}}) &= \frac{|S\_{11}|^{\frac{m}{2} - \frac{r+1}{2}} \mathbf{e}^{-\frac{1}{2}} \operatorname{tr}(\Sigma\_{11}^{-1} S\_{11})}{2^{\frac{mr}{2}} \Gamma\_{r}(\frac{m}{2}) |\Sigma\_{11}|^{\frac{m}{2}}} \\ &= \frac{[s\_{11}s\_{22} - s\_{12}^{2}] \mathbf{e}^{-\frac{1}{10}(3s\_{11} + 2s\_{12} + 2s\_{22})}}{(3)(2^{3})(5)^{\frac{5}{2}}\pi}, \; Y\_{3} > O, \end{split}$$

and zero elsewhere. This completes the calculations.

**Example 5.5a.1.** Let the 3 × 3 Hermitian positive definite matrix *S*˜ have a complex Wishart density with degrees of freedom *m* = 5 and parameter matrix *Σ>O*. Determine the densities of *Y*˜ <sup>1</sup> = *S*˜ <sup>22</sup> − *S*˜ 21*S*˜−<sup>1</sup> <sup>11</sup> *S*˜ <sup>12</sup>*, Y*˜ <sup>2</sup> = *S*˜ <sup>22</sup> and *Y*˜ <sup>3</sup> = *S*˜ <sup>11</sup> where

$$
\tilde{S} = \begin{bmatrix} \tilde{S}\_{11} & \tilde{S}\_{12} \\ \tilde{S}\_{21} & \tilde{S}\_{22} \end{bmatrix}, \ \Sigma = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix} = \begin{bmatrix} 3 & -i & 0 \\ i & 2 & i \\ 0 & -i & 2 \end{bmatrix}
$$

with *S*<sup>11</sup> and *Σ*<sup>11</sup> being 2 × 2.

**Solution 5.5a.1.** Observe that *Σ* is Hermitian positive definite. We need the following numerical results:

$$B \equiv \Sigma 2 - \Sigma\_{21} \Sigma\_{11}^{-1} \Sigma\_{12} = 2 - [0, -i] \begin{bmatrix} 1 \\ \frac{7}{5} \end{bmatrix} \begin{bmatrix} 2 & i \\ -i & 3 \end{bmatrix} \begin{bmatrix} 0 \\ i \end{bmatrix} = 2 - \frac{3}{5} = \frac{7}{5};$$

$$B^{-1} = \frac{5}{7}, \ |B| = \frac{7}{5}, \ \Sigma\_{11}^{-1} = \frac{1}{5} \begin{bmatrix} 2 & i \\ -i & 3 \end{bmatrix}.$$

Note that *Y*˜ <sup>1</sup> and *Y*˜ <sup>2</sup> are real scalar quantities which will be denoted as *y*<sup>1</sup> and *y*2*,* respectively. Let the densities of *y*<sup>1</sup> and *y*<sup>2</sup> be *fj (yj ), j* = 1*,* 2*.* Then, with our usual notations, *f*1*(y*1*)* is

$$\begin{split} f\_{1}(\mathbf{y}\_{1}) &= \frac{|\det(\tilde{\mathbf{y}}\_{1})|^{(m-r)-(p-r)} \mathbf{e}^{-\text{tr}(\mathbf{B}^{-1}\tilde{\mathbf{y}}\_{1})}}{|\det(\mathbf{B})|^{m-r} \tilde{\Gamma}\_{p-r}(m-r)} \\ &= \frac{\mathbf{y}\_{1}^{2} \mathbf{e}^{-\frac{\mathbf{y}}{\mathcal{T}}\mathbf{y}\_{1}}}{(\frac{\mathcal{T}}{\mathfrak{F}})^{3} \Gamma(3)}, \ 0 \leq \mathbf{y}\_{1} < \infty, \end{split}$$

and zero elsewhere, and the density of *y*˜2, is

$$\begin{split} f\_2(\mathbf{y}\_2) &= \frac{|\det(\tilde{\mathbf{y}}\_2)|^{m - (p - r)} \mathbf{e}^{-\text{tr}(\Sigma\_{22}^{-1} \tilde{\mathbf{y}}\_2)}}{|\det(\Sigma\_{22})|^m \tilde{\Gamma}\_{p - r}(m)} \\ &= \frac{\mathbf{y}\_2^4 \mathbf{e}^{-\frac{1}{2} \mathbf{y}\_2}}{2^5 \Gamma(5)}, \ 0 \le \mathbf{y}\_2 < \infty, \end{split}$$

and zero elsewhere. Note that *Y*˜ <sup>3</sup> = *S*˜ <sup>11</sup> is 2 × 2. Letting

$$\tilde{S}\_{11} = \begin{bmatrix} s\_{11} & \tilde{s}\_{12} \\ \tilde{s}\_{12}^\* & s\_{22} \end{bmatrix}, \text{ tr}(\Sigma\_{11}^{-1}\tilde{S}\_{11}) = \frac{1}{5} [2s\_{11} + 3s\_{22} + i\tilde{s}\_{12}^\* - i\tilde{s}\_{12}]$$

and |det*(Y*˜ <sup>3</sup>*)*|=[*s*11*s*<sup>22</sup> − ˜*s*<sup>∗</sup> <sup>12</sup>*s*˜12]. With our usual notations, the density of *Y*˜ 3, denoted by *f*˜ <sup>3</sup>*(Y*˜ <sup>3</sup>*)*, is the following:

$$\begin{split} \tilde{f}\_{3}(\tilde{Y}\_{3}) &= \frac{|\det(\tilde{Y}\_{3})|^{m-r} \mathbf{e}^{-\text{tr}(\Sigma\_{11}^{-1}\tilde{Y}\_{3})}}{|\det(\Sigma\_{11})|^{m}\tilde{I}\_{r}(m)} \\ &= \frac{[s\_{11}s\_{22} - \tilde{s}\_{12}\tilde{s}\_{12}^{\*}]^{3}\mathbf{e}^{-\frac{1}{5}[2s\_{11} + 3s\_{22} + i\tilde{s}\_{12}^{\*} - i\tilde{s}\_{12}]}}{\mathbf{5}^{5}\tilde{I}\_{2}(\mathbf{5})}, \ \tilde{Y}\_{3} > O, \end{split}$$

and zero elsewhere, where 55*Γ*˜ <sup>2</sup>*(*5*)* = 3125*(*144*)π*. This completes the computations.

#### **5.5.8. Connections to geometrical probability problems**

Consider the representation of the Wishart matrix *S* = *Zn*−1*Z*- *<sup>n</sup>*−<sup>1</sup> given in (5.5.8) where the *p* rows are linearly independent 1 × *(n* − 1*)* vectors. Then, these *p* linearly independent rows, taken in order, form a convex hull and determine a *p*-parallelotope in that hull, which is determined by the *p* points in the *(n* − 1*)*-dimensional Euclidean space, *n* − 1 ≥ *p*. Then, as explained in Mathai (1999), the volume content of this parallelotope is *v* = |*Zn*−1*Z*- *<sup>n</sup>*−1| 1 <sup>2</sup> = |*S*| 1 <sup>2</sup> , where *S* ∼ *Wp(n* − 1*, Σ), Σ > O*. Thus, the volume content of this parallelotope is the positive square root of the generalized variance |*S*|. The distributions of this random volume when the *p* random points are uniformly, type-1 beta, type-2 beta and gamma distributed are provided in Chap. 4 of Mathai (1999).

#### **5.6. The Distribution of the Sample Correlation Coefficient**

Consider the real Wishart density or matrix-variate gamma in (5.5.10) for *p* = 2. For convenience, let us take the degrees of freedom parameter *n* − 1 = *m*. Then for *p* = 2, *f (S)* in (5.5.10), denoted by *<sup>f</sup>*2*(S)*, is the following, observing that <sup>|</sup>*S*| = *<sup>s</sup>*11*s*22*(*<sup>1</sup> <sup>−</sup> *<sup>r</sup>*2*)* where *r* is the sample correlation coefficient:

Arak M. Mathai, Serge B. Provost, Hans J. Haubold

$$f\_2(\mathbf{S}) = f\_2(s\_{11}, \ s\_{22}, r) = \frac{[s\_{11}s\_{22}(1-r^2)]^{\frac{m}{2} - \frac{3}{2}} \mathbf{e}^{-\frac{1}{2} \operatorname{tr}(\Sigma^{-1} \mathbf{S})}}{2^m [\sigma\_{11}\sigma\_{22}(1-\rho^2)]^{\frac{m}{2}} \varGamma\_2(\frac{m}{2})} \tag{5.6.1}$$

where *<sup>ρ</sup>* <sup>=</sup> the population correlation coefficient, <sup>|</sup>*Σ*| = *<sup>σ</sup>*11*σ*<sup>22</sup> <sup>−</sup> *<sup>σ</sup>*<sup>2</sup> <sup>12</sup> <sup>=</sup> *<sup>σ</sup>*11*σ*22*(*<sup>1</sup> <sup>−</sup> *<sup>ρ</sup>*2*)*, *Γ*2*(<sup>m</sup>* <sup>2</sup> *)* = *π* 1 <sup>2</sup>*Γ (<sup>m</sup>* <sup>2</sup> *)Γ (m*−<sup>1</sup> <sup>2</sup> *),* − 1 *<ρ<* 1,

$$\begin{split} \Sigma^{-1} &= \frac{1}{|\boldsymbol{\Sigma}|} \text{Cof}(\boldsymbol{\Sigma}) = \frac{1}{\sigma\_{11}\sigma\_{22}(1-\rho^{2})} \begin{bmatrix} \sigma\_{22} & -\sigma\_{12} \\ -\sigma\_{12} & \sigma\_{11} \end{bmatrix} \\ &= \frac{1}{1-\rho^{2}} \begin{bmatrix} \frac{1}{\sigma\_{11}} & -\frac{\rho}{\sqrt{\sigma\_{11}\sigma\_{22}}} \\ -\frac{\rho}{\sqrt{\sigma\_{11}\sigma\_{22}}} & \frac{1}{\sigma\_{22}} \end{bmatrix}, \quad \sigma\_{12} = \rho\sqrt{\sigma\_{11}\sigma\_{22}}, \\ \text{tr}(\boldsymbol{\Sigma}^{-1}\boldsymbol{S}) &= \frac{1}{1-\rho^{2}} \begin{Bmatrix} \frac{s\_{11}}{\sigma\_{11}} - 2\rho \frac{s\_{12}}{\sqrt{\sigma\_{11}\sigma\_{22}}} + \frac{s\_{22}}{\sigma\_{22}} \\ \frac{s\_{11}}{\sigma\_{11}\sigma\_{22}} + \frac{s\_{21}}{\sigma\_{22}} \end{Bmatrix} \\ &= \frac{1}{1-\rho^{2}} \begin{Bmatrix} \frac{s\_{11}}{\sigma\_{11}} - 2\rho r \frac{\sqrt{s\_{11}}\boldsymbol{S}\boldsymbol{2}}{\sqrt{\sigma\_{11}\sigma\_{22}}} + \frac{s\_{22}}{\sigma\_{22}} \end{Bmatrix}. \end{split} \tag{ii}$$

Let us make the substitution *<sup>x</sup>*<sup>1</sup> <sup>=</sup> *<sup>s</sup>*<sup>11</sup> *<sup>σ</sup>*<sup>11</sup> *, x*<sup>2</sup> <sup>=</sup> *<sup>s</sup>*<sup>22</sup> *σ*22 . Note that d*S* = d*s*<sup>11</sup> ∧ d*s*<sup>22</sup> ∧ d*s*12. But d*s*<sup>12</sup> <sup>=</sup> <sup>√</sup>*s*11*s*<sup>22</sup> <sup>d</sup>*<sup>r</sup>* for fixed *<sup>s</sup>*<sup>11</sup> and *<sup>s</sup>*22. In order to obtain the density of *<sup>r</sup>*, we must integrate out *<sup>x</sup>*<sup>1</sup> and *<sup>x</sup>*2, observing that <sup>√</sup>*s*11*s*<sup>22</sup> is coming from d*s*12:

$$\int\_{s\_{11}, s\_{22}} f\_2(S) \, \mathrm{ds}\_{11} \wedge \mathrm{ds}\_{22} = \int\_{x\_1 > 0} \int\_{x\_2 > 0} \frac{1}{2^m (\sigma\_{11} \sigma\_{22})^{\frac{m}{2}} (1 - \rho^2)^{\frac{m}{2}} \pi^{\frac{1}{2}} \Gamma(\frac{m}{2}) \Gamma(\frac{m - 1}{2})} $$

$$ \times (1 - r^2)^{\frac{m - 3}{2}} (\sigma\_{11} \sigma\_{22} x\_1 x\_2)^{\frac{m}{2} - 1} \mathrm{e}^{-\frac{1}{2(1 - \rho^2)} \{x\_1 - 2r\rho \sqrt{x\_1 x\_2} + x\_2\}} \sigma\_{11} \sigma\_{22} \, \mathrm{dx}\_1 \wedge \mathrm{dx}\_2. \tag{iii}$$

For convenience, let us expand

$$\mathbf{e}^{-\frac{1}{2(1-\rho^2)}(-2r\rho\sqrt{\mathbf{x}\_1\mathbf{x}\_2})} = \sum\_{k=0}^{\infty} \left(\frac{r\rho}{1-\rho^2}\right)^k \frac{\mathbf{x}\_1^{\frac{k}{2}}\mathbf{x}\_2^{\frac{k}{2}}}{k!}.\tag{i\nu}$$

Then the part containing *x*<sup>1</sup> gives the integral

$$\int\_{x\_1=0}^{\infty} x\_1^{\frac{m}{2}-1+\frac{k}{2}} e^{-\frac{x\_1}{2(1-\rho^2)}} \,\mathrm{d}x\_1 = [2(1-\rho^2)]^{\frac{m}{2}+\frac{k}{2}} \Gamma(\frac{m}{2}+\frac{k}{2}), \; m \ge 2. \tag{\nu}$$

By symmetry, the integral over *<sup>x</sup>*<sup>2</sup> gives [2*(*<sup>1</sup> <sup>−</sup> *<sup>ρ</sup>*2*)*] *m* <sup>2</sup> <sup>+</sup>*<sup>k</sup>* <sup>2</sup>*Γ (m*+*<sup>k</sup>* <sup>2</sup> *), m* ≥ 2*.* Collecting all the constants we have

$$\frac{(\sigma\_{11}\sigma\_{22})^{\frac{m}{2}}2^{m+k}(1-\rho^2)^{m+k}\Gamma^2(\frac{m+k}{2})}{2^m(\sigma\_{11}\sigma\_{22})^{\frac{m}{2}}(1-\rho^2)^{\frac{m}{2}}\pi^{\frac{1}{2}}\Gamma(\frac{m}{2})\Gamma(\frac{m-1}{2})}=\frac{(1-\rho^2)^{\frac{m}{2}+k}2^k\Gamma^2(\frac{m+k}{2})}{\pi^{\frac{1}{2}}\Gamma(\frac{m}{2})\Gamma(\frac{m-1}{2})}.\tag{vb}$$

We can simplify *Γ (<sup>m</sup>* <sup>2</sup> *)Γ (<sup>m</sup>* <sup>2</sup> <sup>−</sup><sup>1</sup> <sup>2</sup> *)* by using the duplication formula for gamma functions, namely

$$
\Gamma(2z) = \pi^{-\frac{1}{2}} 2^{2z-1} \Gamma(z) \Gamma\left(z + \frac{1}{2}\right), \ z = \frac{m-1}{2}, \tag{5.6.2}
$$

Then,

$$
\Gamma\left(\frac{m}{2}\right)\Gamma\left(\frac{m-1}{2}\right) = \frac{\Gamma(m-1)\pi^{\frac{1}{2}}}{2^{m-2}}.\tag{\text{vii}}
$$

Hence the density of *r*, denoted by *fr(r)*, is the following:

$$f\_r(r) = \frac{2^{m-2}(1-\rho^2)^{\frac{m}{2}}}{\Gamma(m-1)\pi}(1-r^2)^{\frac{1}{2}(m-3)}\sum\_{k=0}^{\infty} \frac{(2r\rho)^k}{k!} \Gamma^2(\frac{m+k}{2}), \ -1 \le r \le 1,\tag{5.6.3}$$

and zero elsewhere, *m* = *n* − 1*, n* being the sample size.

# **5.6.1. The special case** *ρ* = 0

In this case, (5.6.3) becomes

$$f\_r(r) = \frac{2^{m-2}\Gamma^2(\frac{m}{2})}{\Gamma(m-1)\pi} (1-r^2)^{\frac{m-1}{2}-1}, \ -1 \le r \le 1, \ m = n-1 \tag{5.6.4}$$

$$\lambda = \frac{\Gamma(\frac{m}{2})}{\sqrt{\pi}\Gamma(\frac{m-1}{2})}(1-r^2)^{\frac{m-1}{2}-1}, \ -1 \le r \le 1,\tag{5.6.5}$$

zero elsewhere, *m* = *n* − 1 ≥ 2*, n* being the sample size. The simplification is made by using the duplication formula and writing *Γ (m* <sup>−</sup> <sup>1</sup>*)* <sup>=</sup> *<sup>π</sup>*−<sup>1</sup> <sup>2</sup> 2*m*−2*Γ (m*−<sup>1</sup> <sup>2</sup> *)Γ (<sup>m</sup>* <sup>2</sup> *)*. For testing the hypothesis *Ho* : *ρ* = 0*,* the test statistic is *r* and the null distribution, that is, the distribution under the null hypothesis *Ho* is given in (5.6.5). Numerical tables of percentage points obtained from (5.6.5) are available. If *ρ* = 0, the non-null distribution is available from (5.6.3); so, if we wish to test the hypothesis *Ho* : *ρ* = *ρo* where *ρo* is a given quantity, we can compute the percentage points from (5.6.3). It can be shown from (5.6.5) that for *<sup>ρ</sup>* <sup>=</sup> <sup>0</sup>*, tm* <sup>=</sup> <sup>√</sup>*m*<sup>√</sup> *<sup>r</sup>* 1−*r*<sup>2</sup> is distributed as a Student-*t* with *m* degrees of freedom, and hence for testing *Ho* : *ρ* = 0 against *H*<sup>1</sup> : *ρ* = 0, the null hypothesis can be rejected if <sup>|</sup>*tm*| = <sup>√</sup>*<sup>m</sup>* <sup>√</sup> *<sup>r</sup>* 1−*r*<sup>2</sup> <sup>≥</sup> *tm, <sup>α</sup>* 2 where *P r*{|*tm*| ≥ *tm, <sup>α</sup>* <sup>2</sup> } = *α*. For tests that make use of the Student-*t* statistic, refer to Mathai and Haubold (2017b). Since the density given in (5.6.5) is an even function, when *ρ* = 0, all odd order moments are equal to zero and the even order moments can easily be evaluated from type-1 beta integrals.

#### **5.6.2. The multiple and partial correlation coefficients**

Let the *p* × 1 real vector *Xj* with *X*- *<sup>j</sup>* = *(x*1*<sup>j</sup> ,...,xpj )* have a *p*-variate distribution whose mean value *E(Xj )* = *μ* and covariance matrix Cov*(Xj )* = *Σ, Σ > O*, where *μ* is *p* × 1 and *Σ* is *p* × *p*, for *j* = 1*, . . . , n,* the *Xj* 's being iid (independently and identically distributed). Consider the following partitioning of *Σ*:

$$
\Sigma = \begin{bmatrix}
\sigma\_{11} & \Sigma\_{12} \\
\Sigma\_{21} & \Sigma\_{22}
\end{bmatrix}, \ \sigma\_{11} > 0 \text{ is } 1 \times 1, \ \Sigma\_{22} > O \text{ is } (p-1) \times (p-1), \ \Sigma\_{12}' = \Sigma\_{21}.
$$

Let

$$
\rho\_{1.(2\dots p)}^2 = \frac{\Sigma\_{12} \Sigma\_{22}^{-1} \Sigma\_{21}}{\sigma\_{\rm ll}}.\tag{5.6.6}
$$

Then, *ρ*1*.(*2*...p)* is called the *multiple correlation coefficient of x*1*<sup>j</sup> on x*2*<sup>j</sup> ,...,xpj* . The sample value corresponding to *ρ*<sup>2</sup> <sup>1</sup>*.(*2*...p)* which is denoted by *<sup>r</sup>*<sup>2</sup> <sup>1</sup>*.(*2*...p)* and referred to as the square of the sample multiple correlation coefficient, is given by

$$r\_{1.(2...p)}^2 = \frac{S\_{12}S\_{22}^{-1}S\_{21}}{s\_{11}}\text{ with }S = \begin{bmatrix} s\_{11} & S\_{12} \\ S\_{21} & S\_{22} \end{bmatrix} \tag{5.6.7}$$

where *s*<sup>11</sup> is 1×1, *S*<sup>22</sup> is *(p*−1*)*×*(p*−1*)*, *S* = *(***X**−**X**¯ *)(***X**−**X**¯ *)*- , **X** = *(X*1*,...,Xn)* is the *<sup>p</sup>* <sup>×</sup>*<sup>n</sup>* sample matrix, *<sup>n</sup>* being the sample size, *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+*Xn),* **X**¯ = *(X,... ,* ¯ *X)*¯ is *p* × *n*, the *Xj* 's, *j* = 1*, . . . , n,* being iid according to a given *p*-variate population having mean value vector *μ* and covariance matrix *Σ>O*, which need not be Gaussian.

### **5.6.3. Different derivations of** *ρ*1*.(*2*...p)*

Consider a prediction problem involving real scalar variables where *x*<sup>1</sup> is predicted by making use of *x*2*,...,xp* or linear functions thereof. Let *A*- <sup>2</sup> = *(a*2*,...,ap)* be a constant vector where *aj , j* = 2*,...,p* are real scalar constants. Letting *X*- *(*2*)* = *(x*2*,...,xp)*, a linear function of *X(*2*)* is *u* = *A*- <sup>2</sup>*X(*2*)* = *a*2*x*2+···+*apxp*. Then, the mean value and variance of this linear function are *E*[*u*] = *E*[*A*- <sup>2</sup>*X(*2*)*] = *A*- <sup>2</sup>*μ(*2*)* and Var*(u)* = Var*(A*- <sup>2</sup>*X(*2*))* = *A*- <sup>2</sup>*Σ*22*A*<sup>2</sup> where *μ*- *(*2*)* = *(μ*2*,...,μp)* = *E*[*X(*2*)*] and *Σ*<sup>22</sup> is the covariance matrix associated with *X(*2*),* which is available from the partitioning of *Σ* specified in the previous subsection. Let us determine the correlation between *x*1, the variable being predicted, and *u*, a linear function of the variables being utilized to predict *x*1, denoted by *ρ*1*,u*, that is,

$$\rho\_{\mathbf{l},\mu} = \frac{\text{Cov}(\mathbf{x}\_{\mathbf{l}}, \mu)}{\sqrt{\text{Var}(\mathbf{x}\_{\mathbf{l}})\text{Var}(\mu)}},$$

where Cov*(x*1*, u)* = *E*[*(x*1−*E(x*1*))(u*−*E(u))*] = *E*[*(x*1−*E(x*1*))(X(*2*)*−*E(X(*2*)))*- *A*2] = Cov*(x*1*, X(*2*))A*<sup>2</sup> = *Σ*12*A*2, Var*(x*1*)* = *σ*11, Var*(u)* = *A*- <sup>2</sup>*Σ*22*A*<sup>2</sup> *> O*. Letting *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>22</sup> be the positive definite square root of *<sup>Σ</sup>*22, we can write *<sup>Σ</sup>*12*A*<sup>2</sup> <sup>=</sup> *(Σ*12*Σ*−<sup>1</sup> 2 <sup>22</sup> *)(Σ* 1 2 <sup>22</sup>*A*2*)*. Then, on applying Cauchy-Schwartz' inequality, we may write *<sup>Σ</sup>*12*A*<sup>2</sup> <sup>=</sup> *(Σ*12*Σ*−<sup>1</sup> 2 <sup>22</sup> *) (Σ* 1 2 <sup>22</sup>*A*2*)* ≤ *(Σ*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*21*)(A*- <sup>2</sup>*Σ*22*A*2*)*. Thus,

$$\rho\_{1,u} \le \frac{\sqrt{(\Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21})(A\_2'\Sigma\_{22}A\_2)}}{\sqrt{(\sigma\_{11})(A\_2'\Sigma\_{22}A\_2)}} = \frac{\sqrt{\Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}}}{\sqrt{\sigma\_{11}}}, \text{ that is,}$$

$$\rho\_{1,u}^2 \le \frac{\Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}}{\sigma\_{11}} = \rho\_{1,(2\dots p)}^2. \tag{5.6.8}$$

This establishes the following result:

**Theorem 5.6.1.** *The multiple correlation coefficient ρ*1*.(*2*...p) of x*<sup>1</sup> *on x*2*,...,xp represents the maximum correlation between x*<sup>1</sup> *and an arbitrary linear function of x*2*,...,xp.*

This shows that if we consider the joint variation of *x*<sup>1</sup> and *(x*2*,...,xp)*, this scale-free joint variation, namely the correlation, is maximum when the scale-free covariance, which constitutes a scale-free measure of joint variation, is the multiple correlation coefficient. Correlation measures a scale-free joint scatter in the variables involved, in this case *x*<sup>1</sup> and *(x*2*,...,xp)*. Correlation does not measure general relationships between the variables; counterexamples are provided in Mathai and Haubold (2017b). Hence "maximum correlation" should be interpreted as maximum joint scale-free variation or joint scatter in the variables.

For the next property, we will use the following two basic results on conditional expectations, referring also to Mathai and Haubold (2017b). Let *x* and *y* be two real scalar random variables having a joint distribution. Then,

$$E\{\mathbf{y}\} = E\{E(\mathbf{y}|\mathbf{x})\} \tag{i}$$

whenever the expected values exist, where the inside expectation is taken in the conditional space of *y*, given *x*, for all *x*, that is, *Ey*|*x(y*|*x)*, and the outside expectation is taken in the marginal space of *x*, that is *Ex(x)*. The other result states that

$$\text{Var}(\mathbf{y}) = \text{Var}(E[\mathbf{y}|\mathbf{x}]) + E[\text{Var}(\mathbf{y}|\mathbf{x})] \tag{ii}$$

where it is assumed that the expected value of the conditional variance and the variance of the conditional expectation exist. Situations where the results stated in *(i)* and *(ii)* are applicable or not applicable are described and illustrated in Mathai and Haubold (2017b). In result *(i)*, *x* can be a scalar, vector or matrix variable. Now, let us examine the problem of predicting *x*<sup>1</sup> on the basis of *x*2*,...,xp*. What is the "best" predictor function of *x*2*,...,xp* for predicting *x*1, "best" being construed as in the minimum mean square sense. If *φ(x*2*,...,xp)* is an arbitrary predictor, then at given values of *<sup>x</sup>*2*,...,xp*, *<sup>φ</sup>* is a constant. Consider the squared distance *(x*<sup>1</sup> <sup>−</sup> *b)*<sup>2</sup> between *<sup>x</sup>*<sup>1</sup> and *b* = *φ(x*2*,...,xp*|*x*2*,...,xp)* or *b* is *φ* at given values of *x*2*,...,xp*. Then, "minimum in the mean square sense" means to minimize the expected value of *(x*<sup>1</sup> <sup>−</sup> *b)*<sup>2</sup> over all *<sup>b</sup>* or min *E(x*<sup>1</sup> <sup>−</sup> *b)*2. We have already established in Mathai and Haubold (2017b) that the minimizing value of *b* is *b* = *E*[*x*1] at given *x*2*,...,xp* or the conditional expectation of *x*1, given *x*2*,...,xp* or *b* = *E*[*x*1|*x*2*,...,xp*]. Hence, this "best" predictor is also called *the regression of x*<sup>1</sup> *on (x*2*,...,xp)* or *E*[*x*1|*x*2*,...,xp*] = the regression of *x*<sup>1</sup> on *x*2*,...,xp,* or the best predictor of *x*<sup>1</sup> based on *x*2*,...,xp*. Note that, in general, for any scalar variable *y* and a constant *a*,

$$\begin{split} \left[E\{\mathbf{y} - a\right]^2 = E\{\mathbf{y} - E(\mathbf{y}) + E(\mathbf{y}) - a\}^2 &= E\{\mathbf{(y} - E(\mathbf{y}))^2 - 2E\{\mathbf{y} - E(\mathbf{y})(E(\mathbf{y}) - a)\} \\ &+ E\{(E(\mathbf{y}) - a)^2\} = \text{Var}(\mathbf{y}) + 0 + \left[E(\mathbf{y}) - a\right]^2. \end{split} \tag{iii}$$

As the only term on the right-hand side containing *a* is [*E(y)* − *a*] 2, the minimum is attained when this term is zero since it is a non-negative constant, zero occurring when *a* = *E*[*y*]. Thus, *E*[*y* − *a*] <sup>2</sup> is minimized when *<sup>a</sup>* <sup>=</sup> *<sup>E</sup>*[*y*]. If *<sup>a</sup>* <sup>=</sup> *φ(X(*2*))* at given value of *X(*2*)*, then the best predictor of *x*1, based on *X(*2*)* is *E*[*x*1|*X(*2*)*] or the regression of *x*<sup>1</sup> on *X(*2*)*. Let us determine what happens when *E*[*x*1|*X(*2*)*] is a linear function in *X(*2*)*. Let the linear function be *b*<sup>0</sup> + *b*2*x*<sup>2</sup> +···+ *bpxp* = *b*<sup>0</sup> + *B*- 2*X(*2*), B*- <sup>2</sup> = *(b*2*,...,bp),* where *b*0*, b*2*,...,bp* are real constants [Note that only real variables and real constants are considered in this section]. That is, for some constant *b*0,

$$E[\mathbf{x}\_1|X\_{(2)}] = b\_0 + b\_2\mathbf{x}\_2 + \cdots + b\_p\mathbf{x}\_p.\tag{i\nu}$$

Taking expectation with respect to *x*1*, x*2*,...,xp* in *(iv)*, it follows from *(i)* that the lefthand side becomes *E*[*x*1], the right side being *b*<sup>0</sup> + *b*2*E*[*x*2]+···+ *bpE*[*xp*]; subtracting this from *(iv)*, we have

$$E[\mathbf{x}\_1|X\_{(2)}] - E[\mathbf{x}\_1] = b\_2(\mathbf{x}\_2 - E[\mathbf{x}\_2]) + \dots + b\_p(\mathbf{x}\_p - E[\mathbf{x}\_p]).\tag{\nu}$$

Multiplying both sides of *(v)* by *xj* − *E*[*xj* ] and taking expectations throughout, the right-hand side becomes *b*2*σ*2*<sup>j</sup>* +···+ *bpσpj* where *σij* = Cov*(xi, xj ), i* = *j,* and it is the variance of *xj* when *i* = *j* . The left-hand side is *E*[*(xj* − *E(xj ))(E*[*x*1|*x(*2*)*] − *E(x*1*))*] = *E*[*E(x*1*xj* |*X(*2*))*] − *E(xj )E(x*1*)* = *E*[*x*1*xj* ] − *E(x*1*)E(xj )* = Cov*(x*1*, xj )*. Three properties were utilized in the derivation, namely *(i)*, the fact that Cov*(u, v)* = *E*[*(u* − *E(u))(v* − *E(v))*] = *E*[*u(v* − *E(v))*] = *E*[*v(u* − *E(u))*] and Cov*(u, v)* = *E(uv)* − *E(u)E(v)*. As well, Var*(u)* = *E*[*u* − *E(u)*] <sup>2</sup> <sup>=</sup> *<sup>E</sup>*[*u(u* <sup>−</sup> *E(u))*] as long as the second order moments exist. Thus, we have the following by combining all the linear equations for *j* = 2*,...,p*:

$$
\Sigma\_{21} = \Sigma\_{22} b \implies b = \Sigma\_{22}^{-1} \Sigma\_{21} \text{ or } b' = \Sigma\_{12} \Sigma\_{22}^{-1} \tag{5.6.9}
$$

when *Σ*<sup>22</sup> is nonsingular, which is the case as it was assumed that *Σ*<sup>22</sup> *> O*. Now, the best predictor of *x*<sup>1</sup> based on a linear function of *X(*2*)* or the best predictor in the class of all linear functions of *X(*2*)* is

$$E[\mathbf{x}\_1|X\_{(2)}] = b'X\_{(2)} = \Sigma\_{12}\Sigma\_{22}^{-1}X\_{(2)}.\tag{5.6.10}$$

Let us consider the correlation between *x*<sup>1</sup> and its best linear predictor based on *X(*2*)* or the correlation between *x*<sup>1</sup> and the linear regression of *x*<sup>1</sup> on *X(*2*)*. Observe that Cov*(x*1*, Σ*12*Σ*−<sup>1</sup> <sup>22</sup> *X(*2*))* <sup>=</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> Cov*(X(*2*), x*1*)* <sup>=</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*21*, Σ*<sup>21</sup> = *Σ*- 12. Consider the variance of the best linear predictor: Var*(b*- *X(*2*))* = *b*- Cov*(X(*2*))b* = *Σ*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*22*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*<sup>21</sup> <sup>=</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*21. Thus, the square of the correlation between *x*<sup>1</sup> and its best linear predictor or the linear regression on *X(*2*)*, denoted by *ρ*<sup>2</sup> *x*1*,b*- *X(*2*)* , is the following:

$$\rho^2\_{\mathbf{x}\_{\mathrm{1}},b'X\_{(2)}} = \frac{[\mathrm{Cov}(\mathbf{x}\_{\mathrm{1}}, b'X\_{(2)})]^2}{\mathrm{Var}(\mathbf{x}\_{\mathrm{1}})\,\mathrm{Var}(b'X\_{(2)})} = \frac{\Sigma\_{\mathrm{12}}\Sigma\_{22}^{-1}\Sigma\_{21}}{\sigma\_{\mathrm{11}}} = \rho^2\_{\mathrm{1,(2...p)}}.\tag{5.6.11}$$

Hence, the following result:

**Theorem 5.6.2.** *The multiple correlation ρ*1*.(*2*...p) between x*<sup>1</sup> *and x*2*,...,xp is also the correlation between x*<sup>1</sup> *and its best linear predictor or x*<sup>1</sup> *and its linear regression on x*2*,...,xp.*

Observe that normality has not been assumed for obtaining all of the above properties. Thus, the results hold for any population for which moments of order two exist. However, in the case of a nonsingular normal population, that is, *Xj* ∼ *Np(μ, Σ), Σ > O,* it follows from equation (3.3.5), that for *<sup>r</sup>* <sup>=</sup> 1, *<sup>E</sup>*[*x*1|*X(*2*)*] = *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *X(*2*)* when *E*[*X(*2*)*] = *μ(*2*)* <sup>=</sup> *<sup>O</sup>* and *E(x*1*)* <sup>=</sup> *<sup>μ</sup>*<sup>1</sup> <sup>=</sup> 0; otherwise, *<sup>E</sup>*[*x*1|*X(*2*)*] = *<sup>μ</sup>*<sup>1</sup> <sup>+</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *(X(*2*)* − *μ(*2*))*.

#### **5.6.4. Distributional aspects of the sample multiple correlation coefficient**

From (5.6.7), we have

$$1 - r\_{1.(2...p)}^2 = 1 - \frac{S\_{12}S\_{22}^{-1}S\_{21}}{s\_{11}} = \frac{s\_{11} - S\_{12}S\_{22}^{-1}S\_{21}}{s\_{11}} = \frac{|S|}{|S\_{22}|s\_{11}},\tag{5.6.12}$$

which can be established from the expansion of the determinant |*S*|=|*S*22| |*S*<sup>11</sup> − *S*12*S*−<sup>1</sup> <sup>22</sup> *S*21|, which is available from Sect. 1.3. In our case *S*<sup>11</sup> is 1 × 1 and hence we denote it as *<sup>s</sup>*<sup>11</sup> and then <sup>|</sup>*S*<sup>11</sup> <sup>−</sup> *<sup>S</sup>*12*S*−<sup>1</sup> <sup>22</sup> *<sup>S</sup>*21<sup>|</sup> is *<sup>s</sup>*<sup>11</sup> <sup>−</sup> *<sup>S</sup>*12*S*−<sup>1</sup> <sup>22</sup> *S*<sup>21</sup> which is 1 × 1. Let *<sup>u</sup>* <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>r</sup>*<sup>2</sup> <sup>1</sup>*.(*2*...p)* <sup>=</sup> <sup>|</sup>*S*<sup>|</sup> <sup>|</sup>*S*22|*s*<sup>11</sup> . We can compute arbitrary moments of *<sup>u</sup>* by integrating out over the density of *S*, namely the Wishart density with *m* = *n* − 1 degrees of freedom when the population is Gaussian, where *n* is the sample size. That is, for arbitrary *h*,

$$E[u^h] = \frac{1}{2^{\frac{mp}{2}} |\Sigma|^{\frac{m}{2}} \Gamma\_p(\frac{m}{2})} \int\_{S > O} u^h |S|^{\frac{m}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \operatorname{tr}(\Sigma^{-1} \mathbf{S})} \mathrm{d}S. \tag{i}$$

Note that *<sup>u</sup><sup>h</sup>* = |*S*<sup>|</sup> *<sup>h</sup>*|*S*22<sup>|</sup> <sup>−</sup>*hs*−*<sup>h</sup>* <sup>11</sup> . Among the three factors |*S*| *<sup>h</sup>*, <sup>|</sup>*S*22<sup>|</sup> <sup>−</sup>*<sup>h</sup>* and *s*−*<sup>h</sup>* <sup>11</sup> , |*S*22| −*h* and *s*−*<sup>h</sup>* <sup>11</sup> are creating problems. We will replace these by equivalent integrals so that the problematic part be shifted to the exponent. Consider the identities

$$\mathbf{s}\_{11}^{-h} = \frac{1}{\Gamma(h)} \int\_{x=0}^{\infty} x^{h-1} \mathbf{e}^{-s\_{11}x} \mathbf{dx}, \text{ } x > 0, \text{ } s\_{11} > 0, \text{ } \Re(h) > 0 \tag{ii}$$

$$|S\_{22}|^{-h} = \frac{1}{\Gamma\_{p\_2}(h)} \int\_{X\_2 > O} |X\_2|^{h - \frac{p\_2 + 1}{2}} \mathbf{e}^{-\text{tr}(S\_{22}X\_2)} \mathbf{d}X\_2,\tag{iii}$$

for *<sup>X</sup>*<sup>2</sup> *> O, S*<sup>22</sup> *> O, (h) > <sup>p</sup>*2−<sup>1</sup> <sup>2</sup> where *X*<sup>2</sup> *> O* is a *p*<sup>2</sup> × *p*<sup>2</sup> real positive definite matrix, *<sup>p</sup>*<sup>2</sup> <sup>=</sup> *<sup>p</sup>* <sup>−</sup> <sup>1</sup>*, p*<sup>1</sup> <sup>=</sup> <sup>1</sup>*, p*<sup>1</sup> <sup>+</sup> *<sup>p</sup>*<sup>2</sup> <sup>=</sup> *<sup>p</sup>*. Then, excluding <sup>−</sup><sup>1</sup> <sup>2</sup> , the exponent in *(i)* becomes the following:

$$\text{tr}(\boldsymbol{\Sigma}^{-1}\mathbf{S}) + 2\mathbf{s}\_{11}\mathbf{x} + 2\text{tr}(\mathbf{S}\_{22}\mathbf{X}\_2) = \text{tr}[\mathbf{S}(\boldsymbol{\Sigma}^{-1} + 2\mathbf{Z})],\\\mathbf{Z} = \begin{bmatrix} \mathbf{x} & O \\ O & \mathbf{X}\_2 \end{bmatrix}.\qquad(\text{iv)}$$

Noting that *(Σ*−<sup>1</sup> <sup>+</sup> <sup>2</sup>*Z)* <sup>=</sup> *<sup>Σ</sup>*−1*(I* <sup>+</sup> <sup>2</sup>*ΣZ)*, we are now in a position to integrate out *<sup>S</sup>* from *(i)* by using a real matrix-variate gamma integral, denoting the constant part in *(i)* as *c*1:

$$\begin{split} E[u^{h}] = c\_{1} \frac{1}{\Gamma(h)\Gamma\_{P2}(h)} \int\_{x=0}^{\infty} x^{h-1} \int\_{X\_{2}>0} |X\_{2}|^{h-\frac{p\_{2}+1}{2}} \\ & \qquad \times \left[ \int\_{S>O} |S|^{\frac{m}{2}+h-\frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}[S(\Sigma^{-1}+2Z)]} \mathbf{dS} \right] \text{d}x \wedge \text{d}X\_{2} \\ = c\_{1} 2^{p(\frac{m}{2}+h)} \Gamma\_{P}(m/2+h) \int\_{x=0}^{\infty} \int\_{X\_{2}>0} x^{h-1} |X\_{2}|^{h-\frac{p\_{2}+1}{2}} |\Sigma^{-1}+2Z|^{-(\frac{m}{2}+h)} \text{d}x \wedge \text{d}X\_{2} \\ = \frac{c\_{1} 2^{p(\frac{m}{2}+h)}}{\Gamma(h)\Gamma\_{P2}(h)} \Gamma\_{P}(m/2+h) |\Sigma|^{\frac{m}{2}+h} \int\_{x=0}^{\infty} \int\_{X\_{2}>0} x^{h-1} |X\_{2}|^{h-\frac{p\_{2}+1}{2}} \\ & \qquad \times |I + 2\Sigma Z|^{-(\frac{m}{2}+h)} \text{d}x \wedge \text{d}X\_{2}. \end{split}$$

#### Matrix-Variate Gamma and Beta Distributions

The integral in (5.6.13) can be evaluated for a general *Σ,* which will produce the nonnull density of 1 <sup>−</sup> *<sup>r</sup>*<sup>2</sup> <sup>1</sup>*.(*2*...p)*—non-null in the sense that the population multiple correlation *ρ*1*.(*2*...p)* = 0. However, if *ρ*1*.(*2*...p)* = 0, which we call the null case, the determinant part in (5.6.13) splits into two factors, one depending only on *x* and the other only involving *X*2. So letting *Ho*: *ρ*1*.(*2*...p)* = 0,

$$\begin{split} E[\boldsymbol{u}^{h}|\boldsymbol{H}\_{o}] = \frac{c\_{1}\mathcal{D}^{p(\frac{m}{2}+h)}}{\Gamma(h)\Gamma\_{p\_{2}}(h)}\Gamma\_{p}(m/2+h)|\boldsymbol{\Sigma}|^{\frac{m}{2}+h}\int\_{0}^{\infty}\mathbf{x}^{h-1}[1+2\sigma\_{11}\mathbf{x}]^{-(\frac{m}{2}+h)}d\mathbf{x} \\ \quad \times \int\_{X\_{2}>O}|\boldsymbol{X}\_{2}|^{h-\frac{p\_{2}+1}{2}}|I+2\boldsymbol{\Sigma}\_{22}\boldsymbol{X}\_{2}|^{-(\frac{m}{2}+h)}d\mathbf{X}\_{2} . \quad (\mathbf{v}). \end{split}$$

But the *<sup>x</sup>*-integral gives *Γ (h)Γ ( <sup>m</sup>* 2 *) Γ ( <sup>m</sup>* <sup>2</sup> <sup>+</sup>*h) (*2*σ*11*)*−*<sup>h</sup>* for *(h) >* 0 and the *<sup>X</sup>*2-integral gives *Γp*<sup>2</sup> *(h)Γp*<sup>2</sup> *( <sup>m</sup>* 2 *) Γp*<sup>2</sup> *( <sup>m</sup>* <sup>2</sup> <sup>+</sup>*h)* <sup>|</sup>2*Σ*22<sup>|</sup> <sup>−</sup>*<sup>h</sup>* for *(h) > <sup>p</sup>*2−<sup>1</sup> <sup>2</sup> . Substituting all these in *(v)*, we note that all the factors containing 2 and *Σ,σ*11*, Σ*<sup>22</sup> cancel out, and then by using the fact that

$$\frac{\Gamma\_p(\frac{m}{2} + h)}{\Gamma(\frac{m}{2} + h)\Gamma\_{p-1}(\frac{m}{2} + h)} = \pi^{\frac{p-1}{2}} \frac{\Gamma(\frac{m}{2} - \frac{p-1}{2} + h)}{\Gamma(\frac{m}{2} + h)},$$

we have the following expression for the *h*-th null moment of *u*:

$$E[u^h|H\_o] = \frac{\Gamma(\frac{m}{2})}{\Gamma(\frac{m}{2} - \frac{p-1}{2})} \frac{\Gamma(\frac{m}{2} - \frac{p-1}{2} + h)}{\Gamma(\frac{m}{2} + h)}, \ \Re(h) > -\frac{m}{2} + \frac{p-1}{2}, \tag{5.6.14}$$

which happens to be the *h*-th moment of a real scalar type-1 beta random variable with the parameters *(<sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *)*. Since *h* is arbitrary, this *h*-th moment uniquely determines the distribution, thus the following result:

**Theorem 5.6.3.** *When the population has a p-variate Gaussian distribution with the parameters μ and Σ>O, and the population multiple correlation coefficient ρ*1*.(*2*...p)* = 0*, the sample multiple correlation coefficient <sup>r</sup>*1*.(*2*...p) is such that <sup>u</sup>* <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>r</sup>*<sup>2</sup> <sup>1</sup>*.(*2*...p) is distributed as a real scalar type-1 beta random variable with the parameters (<sup>m</sup>* <sup>2</sup> <sup>−</sup>*p*−<sup>1</sup> <sup>2</sup> *, <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *), and thereby <sup>v</sup>* <sup>=</sup> *<sup>u</sup>* <sup>1</sup>−*<sup>u</sup>* <sup>=</sup> <sup>1</sup>−*r*<sup>2</sup> 1*.(*2*...p) r*2 1*.(*2*...p) is distributed as a real scalar type-2 beta random variable with the parameters (<sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *) and <sup>w</sup>* <sup>=</sup> <sup>1</sup>−*<sup>u</sup> <sup>u</sup>* <sup>=</sup> *<sup>r</sup>*<sup>2</sup> 1*.(*2*...p)* <sup>1</sup>−*r*<sup>2</sup> 1*.(*2*...p) is distributed as a real scalar type-2 beta random variable with the parameters (p*−<sup>1</sup> <sup>2</sup> *, <sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> *) whose density is*

$$f\_w(w) = \frac{\Gamma(\frac{m}{2})}{\Gamma(\frac{m}{2} - \frac{p-1}{2})\Gamma(\frac{p-1}{2})} w^{\frac{p-1}{2}-1} (1+w)^{-(\frac{m}{2})}, \ 0 \le w < \infty,\tag{5.6.15}$$

*and zero elsewhere.*

As *F*-tables are available, we may conveniently express the above real scalar type-2 beta density in terms of an *<sup>F</sup>*-density. It suffices to make the substitution *<sup>w</sup>* <sup>=</sup> *<sup>p</sup>*−<sup>1</sup> *<sup>m</sup>*−*p*+1*<sup>F</sup>* where *F* is a real *F* random variable having *p* − 1 and *m* − *p* + 1 degrees of freedom, that is, an *Fp*−1*,m*−*p*+<sup>1</sup> random variable, with *m* = *n* − 1*, n* being the sample size. The density of this *F* random variable, denoted by *fF (F )*, is the following:

$$f\_F(F) = \frac{\Gamma(\frac{m}{2})}{\Gamma(\frac{p-1}{2})\Gamma(\frac{m}{2} - \frac{p-1}{2})} \left(\frac{p-1}{m-p+1}\right)^{\frac{p-1}{2}} F^{\frac{p-1}{2}-1} \left(1 + \frac{p-1}{m-p+1} F\right)^{-\frac{m}{2}},\tag{5.6.16}$$

whenever 0 ≤ *F <* ∞, and zero elsewhere. In the above simplification, observe that *(p*−1*)/*2 *( m* <sup>2</sup> <sup>−</sup>*p*−<sup>1</sup> <sup>2</sup> *)* <sup>=</sup> *<sup>p</sup>*−<sup>1</sup> *<sup>m</sup>*−*p*+<sup>1</sup> . Then, for taking a decision with respect to testing the hypothesis *Ho* : *<sup>ρ</sup>*1*.(*2*...p)* <sup>=</sup> <sup>0</sup>*,* first compute *Fp*−1*,m*−*p*+<sup>1</sup> <sup>=</sup> *<sup>m</sup>*−*p*+<sup>1</sup> *<sup>p</sup>*−<sup>1</sup> *w, w* <sup>=</sup> *<sup>r</sup>*<sup>2</sup> 1*.(*2*...p)* <sup>1</sup>−*r*<sup>2</sup> 1*.(*2*...p)* . Then, reject *Ho* if the observed *Fp*−1*,m*−*p*+<sup>1</sup> ≥ *Fp*−1*,m*−*p*+1*,α* for a given *α*. This will be a test at significance level *α* or, equivalently, a test whose critical region's size is *α*. The non-null distribution for evaluating the power of this likelihood ratio test can be determined by evaluating the integral in (5.6.13) and identifying the distribution through the uniqueness property of arbitrary moments.

**Note 5.6.1.** By making use of Theorem 5.6.3 as a starting point and exploiting various results connecting real scalar type-1 beta, type-2 beta, *F* and gamma variables, one can obtain numerous results on the distributional aspects of certain functions involving the sample multiple correlation coefficient.

#### **5.6.5. The partial correlation coefficient**

Partial correlation is a concept associated with the correlation between residuals in two variables after removing the effects of linear regression on a set of other variables. Consider the real vector *X*- = *(x*1*, x*2*, x*3*,...,xp)* = *(x*1*, x*2*, X*- 3*), X*- <sup>3</sup> = *(x*3*,...,xp)* where *x*1*,...,xp* are all real scalar variables. Let the covariance matrix of *X* be *Σ>O* and let it be partitioned as follows:

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ X\_{(3)} \end{bmatrix}, \quad \Sigma = \begin{bmatrix} \sigma\_{11} & \sigma\_{12} & \Sigma\_{13} \\ \sigma\_{21} & \sigma\_{22} & \Sigma\_{23} \\ \Sigma\_{31} & \Sigma\_{32} & \Sigma\_{33} \end{bmatrix}, \ X\_{(3)} \text{ being } (p-2) \times 1,$$

where *σ*11*, σ*12*, σ*21*, σ*<sup>22</sup> are 1×1, *Σ*<sup>13</sup> and *Σ*<sup>23</sup> are 1×*(p*−2*)*, *Σ*<sup>31</sup> = *Σ*- <sup>13</sup>*, Σ*<sup>32</sup> = *Σ*- 23 and *Σ*<sup>33</sup> is *(p* − 2*)* × *(p* − 2*)*. Let *E*[*X*] = *O* without any loss of generality. Consider the problem of predicting *x*<sup>1</sup> by using a linear function of *X(*3*)*. Then, the regression of *x*<sup>1</sup> on *X(*3*)* is *<sup>E</sup>*[*x*1|*X(*3*)*] = *<sup>Σ</sup>*13*Σ*−<sup>1</sup> <sup>33</sup> *X(*3*)* from (5.6.10), and the residual part, after removing

#### Matrix-Variate Gamma and Beta Distributions

this regression from *<sup>x</sup>*<sup>1</sup> is *<sup>e</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> <sup>−</sup> *<sup>Σ</sup>*13*Σ*−<sup>1</sup> <sup>33</sup> *X(*3*)*. Similarly, the linear regression of *x*<sup>2</sup> on *X(*3*)* is *<sup>E</sup>*[*x*2|*X(*3*)*] = *<sup>Σ</sup>*23*Σ*−<sup>1</sup> <sup>33</sup> *X(*3*)* and the residual in *x*<sup>2</sup> after removing the effect of *X(*3*)* is *<sup>e</sup>*<sup>2</sup> <sup>=</sup> *<sup>x</sup>*<sup>2</sup> <sup>−</sup> *<sup>Σ</sup>*13*Σ*−<sup>1</sup> <sup>33</sup> *X(*3*)*. What are then the variances of *e*<sup>1</sup> and *e*2, the covariance between *e*<sup>1</sup> and *e*2, and the scale-free covariance, namely the correlation between *e*<sup>1</sup> and *e*2? Since *e*<sup>1</sup> and *e*<sup>2</sup> are all linear functions of the variables involved, we can utilize the expressions for variances of linear functions and covariance between linear functions, a basic discussion of such results being given in Mathai and Haubold (2017b). Thus,

$$\begin{split} \text{Var}(e\_{1}) &= \text{Var}(\mathbf{x}\_{1}) + \text{Var}(\boldsymbol{\Sigma}\_{13}\boldsymbol{\Sigma}\_{33}^{-1}\boldsymbol{X}\_{(3)}) - 2\,\text{Cov}(\mathbf{x}\_{1}, \ \boldsymbol{\Sigma}\_{13}\boldsymbol{\Sigma}\_{33}^{-1}\boldsymbol{X}\_{(3)}) \\ &= \sigma\_{11} + \boldsymbol{\Sigma}\_{13}\boldsymbol{\Sigma}\_{33}^{-1}\text{Cov}(\boldsymbol{X}\_{(3)})\boldsymbol{\Sigma}\_{33}^{-1}\boldsymbol{\Sigma}\_{31} - 2\,\text{Cov}(\mathbf{x}\_{1}, \ \boldsymbol{\Sigma}\_{33}^{-1}\boldsymbol{X}\_{(3)}) \\ &= \sigma\_{11} + \boldsymbol{\Sigma}\_{13}\boldsymbol{\Sigma}\_{33}^{-1}\boldsymbol{\Sigma}\_{31} - 2\,\boldsymbol{\Sigma}\_{13}\boldsymbol{\Sigma}\_{33}^{-1}\boldsymbol{\Sigma}\_{31} = \sigma\_{11} - \boldsymbol{\Sigma}\_{13}\boldsymbol{\Sigma}\_{33}^{-1}\boldsymbol{\Sigma}\_{31}. \end{split}$$

It can be similarly shown that

$$\text{Var}(e\_2) = \sigma\_{22} - \Sigma\_{23} \Sigma\_{33}^{-1} \Sigma\_{32} \tag{ii}$$

$$\text{Cov}(e\_1, e\_2) = \sigma\_{12} - \Sigma\_{13} \Sigma\_{33}^{-1} \Sigma\_{32}. \tag{iiii}$$

Then, the correlation between the residuals *e*<sup>1</sup> and *e*2, which is called the partial correlation between *x*<sup>1</sup> and *x*<sup>2</sup> after removing the effects of linear regression on *X(*3*)* and is denoted by *ρ*12*.(*3*...p)*, is such that

$$\rho\_{12.(3\dots p)}^2 = \frac{[\sigma\_{12} - \Sigma\_{13}\Sigma\_{33}^{-1}\Sigma\_{32}]^2}{[\sigma\_{11} - \Sigma\_{13}\Sigma\_{33}^{-1}\Sigma\_{31}][\sigma\_{22} - \Sigma\_{23}\Sigma\_{33}^{-1}\Sigma\_{32}]}.\tag{5.6.17}$$

In the above simplifications, we have for instance used the fact that *Σ*13*Σ*−<sup>1</sup> <sup>33</sup> *Σ*<sup>32</sup> = *Σ*23*Σ*−<sup>1</sup> <sup>33</sup> *Σ*<sup>31</sup> since both are real 1 × 1 and one is the transpose of the other.

The corresponding sample partial correlation coefficient between *x*<sup>1</sup> and *x*<sup>2</sup> after removing the effects of linear regression on *X(*3*)*, denoted by *r*12*.(*3*...p)*, is such that:

$$r\_{12.(3\dots p)}^2 = \frac{[s\_{12} - S\_{13}S\_{33}^{-1}S\_{32}]^2}{[s\_{11} - S\_{13}S\_{33}^{-1}S\_{31}][s\_{22} - S\_{23}S\_{33}^{-1}S\_{32}]} \tag{5.6.18}$$

where the sample sum of products matrix *S* is partitioned correspondingly, that is,

$$S = \begin{bmatrix} s\_{11} & s\_{12} & S\_{13} \\ s\_{21} & s\_{22} & S\_{23} \\ S\_{31} & S\_{32} & S\_{33} \end{bmatrix}, \text{ S3 being } (p-2) \times (p-2),\tag{5.6.19}$$

and *s*11*, s*12*, s*21*, s*<sup>22</sup> being 1×1. In all the above derivations, we did not use any assumption of an underlying Gaussian population. The results hold for any general population as long as product moments up to second order exist. However, if we assume a *p*-variate nonsingular Gaussian population, then we can obtain some interesting results on the distributional aspects of the sample partial correlation, as was done in the case of the sample multiple correlation. Such results will not be herein considered.

#### **Exercises 5.6**

**5.6.1.** Let the *p* × *p* real positive definite matrix *W* be distributed as *W* ∼ *Wp(m, Σ)* with *Σ* = *I* . Consider the partitioning *W* = *W*<sup>11</sup> *W*<sup>12</sup> *<sup>W</sup>*<sup>21</sup> *<sup>W</sup>*<sup>22</sup> where *W*<sup>11</sup> is *r* × *r, r < p*. Evaluate explicitly the normalizing constant in the density of *W* by first integrating out (1): *W*11, (2): *W*22, (3): *W*12.

**5.6.2.** Repeat Exercise 5.6.1 for the complex case.

**5.6.3.** Let the *p* × *p* real positive definite matrix *W* have a real Wishart density with degrees of freedom *m* ≥ *p* and parameter matrix *Σ>O*. Consider the transformation *W* = *T T* where *T* is lower triangular with positive diagonal elements. Evaluate the densities of the *tjj* 's and the *tij* 's, *i>j* if (1): *Σ* = diag*(σ*11*,...,σpp)*, (2): *Σ>O* is a general matrix.

**5.6.4.** Repeat Exercise 5.6.3 for the complex case. In the complex case, the diagonal elements in *T* are real and positive.

**5.6.5.** Let *<sup>S</sup>* <sup>∼</sup> *Wp(m, Σ), Σ > O*. Compute the density of *<sup>S</sup>*−<sup>1</sup> in the real case, and repeat for the complex case.

#### **5.7. Distributions of Products and Ratios of Matrix-variate Random Variables**

In the real scalar case, one can easily interpret products and ratios of real scalar variables, whether these are random or mathematical variables. However, when it comes to matrices, products and ratios are to be carefully defined. Let *X*<sup>1</sup> and *X*<sup>2</sup> be independently distributed *p* × *p* real symmetric and positive definite matrix-variate random variables with density functions *f*1*(X*1*)* and *f*2*(X*2*),* respectively. By definition, *f*<sup>1</sup> and *f*<sup>2</sup> are respectively real-valued scalar functions of the matrices *X*<sup>1</sup> and *X*2. Due to statistical independence of *X*<sup>1</sup> and *X*2*,* their joint density, denoted by *f (X*1*, X*2*)*, is the product of the marginal densities, that is, *f (X*1*, X*2*)* = *f*1*(X*1*)f*2*(X*2*)*. Let us define a ratio and a product of matrices. Let *U*<sup>2</sup> = *X* 1 2 <sup>2</sup> *X*1*X* 1 2 <sup>2</sup> and *U*<sup>1</sup> = *X* 1 2 <sup>2</sup> *<sup>X</sup>*−<sup>1</sup> <sup>1</sup> *X* 1 2 <sup>2</sup> be called the symmetric product and symmetric ratio of the matrices *X*<sup>1</sup> and *X*2, where *X* 1 2 <sup>2</sup> denotes the positive definite square root of the positive definite matrix *X*2. Let us consider the product *U*<sup>2</sup> first. We could have also defined a product by interchanging *X*<sup>1</sup> and *X*2. When it comes to ratios, we could have considered the ratios *X*<sup>1</sup> to *X*<sup>2</sup> as well as *X*<sup>2</sup> to *X*1. Nonetheless, we will start with *U*<sup>1</sup> and *U*<sup>2</sup> as defined above.

#### **5.7.1. The density of a product of real matrices**

Consider the transformation *U*<sup>2</sup> = *X* 1 2 <sup>2</sup> *X*1*X* 1 2 <sup>2</sup> *, V* = *X*2. Then, it follows from Theorem 1.6.5 that:

$$\mathrm{d}X\_1 \wedge \mathrm{d}X\_2 = |V|^{-\frac{p+1}{2}} \mathrm{d}U\_2 \wedge \mathrm{d}V. \tag{5.7.1}$$

Letting the joint density of *U*<sup>2</sup> and *V* be denoted by *g(U*2*,V)* and the marginal density of *U*2, by *g*2*(U*2*)*, we have

$$f\_1(X\_1)f\_2(X\_2)\,\mathrm{d}X\_1 \wedge \mathrm{d}X\_2 = |V|^{-\frac{p+1}{2}} f\_1(V^{-\frac{1}{2}}U\_2V^{-\frac{1}{2}})f\_2(V)\,\mathrm{d}U\_2 \wedge \mathrm{d}V$$

$$g\_2(U\_2) = \int\_V |V|^{-\frac{p+1}{2}} f\_1(V^{-\frac{1}{2}}U\_2V^{-\frac{1}{2}})f\_2(V)\,\mathrm{d}V,\tag{5.7.2}$$

*g*2*(U*2*)* being referred to as the density of the symmetric product *U*<sup>2</sup> of the matrices *X*<sup>1</sup> and *X*2. For example, letting *X*<sup>1</sup> and *X*<sup>2</sup> be independently distributed two-parameter matrixvariate gamma random variables with the densities

$$f\_{\Im j}(X\_j) = \frac{|B\_j|^{\alpha\_j}}{\Gamma\_p(\alpha\_j)} |X\_j|^{\alpha\_j - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(B\_j X\_j)}, \ j = 1, 2,\tag{i}$$

for *Bj > O, Xj > O, (αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,* 2, and zero elsewhere, we have

$$\log \mathcal{G}(U\_2) = c |U\_2|^{\alpha\_1 - \frac{p+1}{2}} \int\_{V > 0} |V|^{\alpha\_2 - \alpha\_1 - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(B\_2 V + B\_1 V^{-\frac{1}{2}} U\_2 V^{-\frac{1}{2}})} \, \text{d}V,\tag{5.7.3}$$

where *c* is the product of the normalizing constants of the densities specified in *(i)*. On comparing (5.7.3) with the Kratzel integral defined in the real scalar case in Chap. ¨ 2, as well as in Mathai (2012) and Mathai and Haubold (1988, 2011a, 2017,a), it is seen that (5.7.3) can be regarded as a real matrix-variate analogue of Kratzel's integral. One ¨ could also obtain the real matrix-variate version of the inverse Gaussian density from the integrand.

As another example, let *f*1*(X*1*)* be a real matrix-variate type-1 beta density as previously defined in this chapter, whose parameters are *(γ* <sup>+</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> *, α)* with *(α) > p*−1 <sup>2</sup> *, (γ ) >* −1, its density being given by

$$f\_4(X\_1) = \frac{\Gamma\_p(\chi + \frac{p+1}{2} + \alpha)}{\Gamma\_p(\chi + \frac{p+1}{2})\Gamma\_p(\alpha)} |X\_1|^{\mathcal{V}} |I - X\_1|^{\alpha - \frac{p+1}{2}} \tag{ii}$$

for *O<X*<sup>1</sup> *< I, (γ ) >* <sup>−</sup>1*, (α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and zero elsewhere. Letting *f*2*(X*2*)* = *f (X*2*)* be any other density, the density of *U*<sup>2</sup> is then

$$\begin{split} g\_{2}(U\_{2}) &= \int\_{V} |V|^{-\frac{p+1}{2}} f\_{1}(V^{-\frac{1}{2}}U\_{2}V^{-\frac{1}{2}})f\_{2}(V)dV \\ &= \frac{\Gamma\_{p}(\mathbf{y}+\frac{p+1}{2}+\alpha)}{\Gamma\_{p}(\mathbf{y}+\frac{p+1}{2})\Gamma\_{p}(\alpha)} \int\_{V} |V|^{-\frac{p+1}{2}} |V^{-\frac{1}{2}}U\_{2}V^{-\frac{1}{2}}|^{\mathbb{V}}|I-V^{-\frac{1}{2}}U\_{2}V^{-\frac{1}{2}}|^{\alpha-\frac{p+1}{2}}f(V)dV \\ &= \frac{\Gamma\_{p}(\mathbf{y}+\frac{p+1}{2}+\alpha)}{\Gamma\_{p}(\mathbf{y}+\frac{p+1}{2})} \frac{|U\_{2}|^{\mathbb{V}}}{\Gamma\_{p}(\alpha)} \int\_{V>U\_{2}>0} |V|^{-\alpha-\frac{p}{2}}|V-U\_{2}|^{\alpha-\frac{p+1}{2}}f(V)dV \\ &= \frac{\Gamma\_{p}(\mathbf{y}+\frac{p+1}{2}+\alpha)}{\Gamma\_{p}(\mathbf{y}+\frac{p+1}{2})}K\_{2,U\_{2},V}^{-\alpha}f \end{split} \tag{5.7.4}$$

where

$$K\_{2,U\_{2},\mathcal{Y}}^{-a}f = \frac{|U\_{2}|^{\mathcal{Y}}}{\Gamma\_{p}(a)} \int\_{V>U\_{2}>0} |V|^{-a-\mathcal{Y}} |V-U\_{2}|^{a-\frac{p+1}{2}} f(V)dV,\ \mathfrak{R}(a) > \frac{p-1}{2},\tag{5.7.5}$$

is called the real matrix-variate Erdelyi-Kober right-sided or second kind fractional in- ´ tegral of order *α* and parameter *γ* as for *p* = 1, that is, in the real scalar case, (5.7.5) corresponds to the Erdelyi-Kober fractional integral of the second kind of order ´ *α* and parameter *γ* . This connection of the density of a symmetric product of matrices to a fractional integral of the second kind was established by Mathai (2009, 2010) and further papers.

#### **5.7.2. M-convolution and fractional integral of the second kind**

Mathai (1997) referred to the structure in (5.7.2) as the M-convolution of a product where *f*<sup>1</sup> and *f*<sup>2</sup> need not be statistical densities. Actually, they could be any function provided the integral exists. However, if *f*<sup>1</sup> and *f*<sup>2</sup> are statistical densities, this M-convolution of a product can be interpreted as the density of a symmetric product. Thus, a physical interpretation to an M-convolution of a product is provided in terms of statistical densities. We have seen that (5.7.2) is connected to a fractional integral when *f*<sup>1</sup> is a real matrix-variate type-1 beta density and *f*<sup>2</sup> is an arbitrary density. From this observation, one can introduce a general definition for a fractional integral of the second kind in the real matrix-variate case. Let

$$f\_1(X\_1) = \phi\_1(X\_1) \frac{|I - X\_1|^{\alpha - \frac{p+1}{2}}}{\Gamma\_p(\alpha)}, \ \Re(\alpha) > \frac{p-1}{2},\tag{iii}$$

and *f*2*(X*2*)* = *φ*2*(X*2*)f (X*2*)* where *φ*<sup>1</sup> and *φ*<sup>2</sup> are specified functions and *f* is an arbitrary function. Then, consider the M-convolution of a product, again denoted by *g*2*(U*2*)*:

$$g\_2(U\_2) = \int\_V |V|^{-\frac{p+1}{2}} \phi\_1(V^{-\frac{1}{2}}U\_2V^{-\frac{1}{2}}) \frac{|I - V^{-\frac{1}{2}}U\_2V^{-\frac{1}{2}}|^{a - \frac{p+1}{2}}}{\Gamma\_p(a)}$$

$$\times \phi\_2(V)f(V) \text{d}V,\ \mathfrak{R}(a) > \frac{p-1}{2}. \tag{5.7.6}$$

The right-hand side (5.7.6) will be called a fractional integral of the second kind of order *α* in the real matrix-variate case. By letting *p* = 1 and specifying *φ*<sup>1</sup> and *φ*2, one can obtain all the fractional integrals of the second kind of order *α* that have previously been defined by various authors. Hence, for a general *p*, one has the corresponding real matrix-variate cases. For example, on letting *φ*1*(X*1*)* = |*X*1| *<sup>γ</sup>* and *<sup>φ</sup>*2*(X*2*)* <sup>=</sup> 1, one has Erdelyi-Kober ´ fractional integral of the second kind of (5.7.5) in the real matrix-variate case as for *p* = 1, it is the Erdelyi-Kober fractional integral of the second kind of order ´ *α*. Letting *φ*1*(X*1*)* = 1 and *φ*2*(X*2*)* = |*X*2| *<sup>α</sup>*, (5.7.6) simplifies to the following integral, again denoted by *g*2*(U*2*)*:

$$\log\_2(U\_2) = \frac{1}{\Gamma\_p(\alpha)} \int\_{V > U\_2 > O} |V - U\_2|^{a - \frac{p+1}{2}} f(V)dV. \tag{5.7.7}$$

For *p* = 1, (5.7.7) is Weyl fractional integral of the second kind of order *α*. Accordingly, (5.7.7) is Weyl fractional integral of the second kind in the real matrix-variate case. For *p* = 1, (5.7.7) is also the Riemann-Liouville fractional integral of the second kind of order *α* in the real scalar case, if there exists a finite upper bound for *V* . If *V* is bounded above by a real positive definite constant matrix *B>O* in the integral in (5.7.7), then (5.7.7) is Riemann-Liouville fractional integral of the second kind of order *α* for the real matrix-variate case. Connections to other fractional integrals of the second kind can be established by referring to Mathai and Haubold (2017).

The appeal of fractional integrals of the second kind resides in the fact that they can be given physical interpretations as the density of a symmetric product when *f*<sup>1</sup> and *f*<sup>2</sup>

are densities or as an M-convolution of products, whether in the scalar variable case or the matrix-variate case, and that in both the real and complex domains.

#### **5.7.3. A pathway extension of fractional integrals**

Consider the following modification to the general definition of a fractional integral of the second kind of order *α* in the real matrix-variate case given in (5.7.6). Let

$$f\_1(X\_1) = \phi\_1(X\_1) \frac{1}{\Gamma\_p(a)} |I - a(1 - q)X\_1|^{\frac{\eta}{1 - q} - \frac{p + 1}{2}}, \ f\_2(X\_2) = \phi\_2(X\_2) f(X\_2), \qquad (\text{iv})$$

where *(α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> and *q <* 1, and *a >* 0*,η>* 0 are real scalar constants. For all *q <* 1, *g*2*(U*2*)* corresponding to *f*1*(X*1*)* and *f*2*(X*2*)* of *(iv)* will define a family of fractional integrals of the second kind. Observe that when *X*<sup>1</sup> and *I* − *a(*1 − *q)X*<sup>1</sup> *> O*, then *O<X*<sup>1</sup> *<* <sup>1</sup> *a(*1−*q)<sup>I</sup>* . However, by writing *(*<sup>1</sup> <sup>−</sup> *q)* = −*(q* <sup>−</sup> <sup>1</sup>*)* for *q >* 1, one can switch into a type-2 beta form, namely, *I* + *a(q* − 1*)X*<sup>1</sup> *> O* for *q >* 1, which implies that *X*<sup>1</sup> *> O* and the fractional nature is lost. As well, when *q* → 1,

$$|I + a(q - 1)X\_1|^{-\frac{\eta}{q - 1}} \to \mathbf{e}^{-a\,\eta \,\mathrm{tr}(X\_1)}$$

which is the exponential form or gamma density form. In this case too, the fractional nature is lost. Thus, through *q*, one can obtain matrix-variate type-1 and type-2 beta families and a gamma family of functions from *(iv)*. Then *q* is called the pathway parameter which generates three families of functions. However, the fractional nature of the integrals is lost for the cases *q >* 1 and *q* → 1. In the real scalar case, *x*<sup>1</sup> may have an exponent and making use of [<sup>1</sup> <sup>−</sup> *(*<sup>1</sup> <sup>−</sup> *q)x<sup>δ</sup>* 1] *<sup>α</sup>*−<sup>1</sup> can lead to interesting fractional integrals for *q <* 1. However, raising *X*<sup>1</sup> to an exponent *δ* in the matrix-variate case will fail to produce results of interest as Jacobians will then take inconvenient forms that cannot be expressed in terms of the original matrices; this is for example explained in detail in Mathai (1997) for the case of a squared real symmetric matrix.

#### **5.7.4. The density of a ratio of real matrices**

One can define a symmetric ratio in four different ways: *X* 1 2 <sup>2</sup> *<sup>X</sup>*−<sup>1</sup> <sup>1</sup> *X* 1 2 <sup>2</sup> with *V* = *X*<sup>2</sup> or *V* = *X*<sup>1</sup> and *X* 1 2 <sup>1</sup> *<sup>X</sup>*−<sup>1</sup> <sup>2</sup> *X* 1 2 <sup>1</sup> with *V* = *X*<sup>2</sup> or *V* = *X*1. All these four forms will produce different structures on *f*1*(X*1*)f*2*(X*2*)*. Since the form *U*<sup>1</sup> that was specified in Sect. 5.7 in terms of *X* 1 2 <sup>2</sup> *<sup>X</sup>*−<sup>1</sup> <sup>1</sup> *X* 1 2 <sup>2</sup> with *V* = *X*<sup>2</sup> provides connections to fractional integrals of the first kind, we will consider this one whose density, denoted by *g*1*(U*1*)*, is the following observing that d*X*<sup>1</sup> ∧ d*X*<sup>2</sup> = |*V* | *p*+1 <sup>2</sup> |*U*1| −*(p*+1*)* d*U*<sup>1</sup> ∧ d*V* :

$$\log\_1(U\_1) = \int\_V |V|^{\frac{p+1}{2}} |U\_1|^{-(p+1)} f\_1(V^{\frac{1}{2}} U\_1^{-1} V^{\frac{1}{2}}) f\_2(V) \mathrm{d}V \tag{5.7.8}$$

provided the integral exists. As in the fractional integral of the second kind in real matrixvariate case, we can give a general definition for a fractional integral of the first kind in the real matrix-variate case as follows: Let *f*1*(X*1*)* and *f*2*(X*2*)* be taken as in the case of fractional integral of the second kind with *φ*<sup>1</sup> and *φ*<sup>2</sup> as preassigned functions. Then

$$\begin{split} g\_1(U\_1) &= \int\_V |V|^{\frac{p+1}{2}} |U\_1|^{-(p+1)} \phi\_1(V^{\frac{1}{2}} U\_1^{-1} V^{\frac{1}{2}}) \\ &\times \frac{1}{\Gamma\_p(\alpha)} |I - V^{\frac{1}{2}} U\_1^{-1} V^{\frac{1}{2}}|^{\alpha - \frac{p+1}{2}} \phi\_2(V) f(V) \mathrm{d}V, \ \mathfrak{R}(\alpha) > \frac{p-1}{2}. \end{split} \tag{5.7.9}$$

As an example, letting

$$\phi\_1(X\_1) = \frac{\Gamma\_P(\mathcal{Y} + \alpha)}{\Gamma\_P(\mathcal{Y})} |X\_1|^{\mathcal{Y} - \frac{p+1}{2}} \text{ and } \phi\_2(X\_2) = 1, \ \Re(\mathcal{Y}) > \frac{p-1}{2},$$

we have

$$\begin{split} g\_1(U\_1) &= \frac{\Gamma\_p(\chi + \alpha)}{\Gamma\_p(\chi)} \frac{|U\_1|^{-\alpha - \chi}}{\Gamma\_p(\alpha)} \int\_{V < U\_1} |V|^{\mathcal{I}} |U\_1 - V|^{\alpha - \frac{p+1}{2}} f(V) \mathrm{d}V \\ &= \frac{\Gamma\_p(\chi + \alpha)}{\Gamma\_p(\chi)} K\_{1, U\_1, \chi}^{-\alpha} f \end{split} \tag{5.7.10}$$

for *(α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γ ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , where

$$K\_{1,U\_1,\mathcal{V}}^{-a}f = \frac{|U\_1|^{-a-\mathcal{V}}}{\Gamma\_P(a)} \int\_{V$$

for *(α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γ ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *,* is Erdelyi-Kober fractional integral of the first kind of ´ order *α* and parameter *γ* in the real matrix-variate case. Since for *p* = 1 or in the real scalar case, *K*−*<sup>α</sup>* <sup>1</sup>*,u*1*,γ f* is Erdelyi-Kober fractional integral of order ´ *α* and parameter *γ* , the first author referred to *K*−*<sup>α</sup>* <sup>1</sup>*,U*1*,γ f* in (5.7.11) as Erdelyi-Kober fractional integral of the first ´ kind of order *α* in the real matrix-variate case.

By specializing *φ*<sup>1</sup> and *φ*<sup>2</sup> in the real scalar case, that is, for *p* = 1, one can obtain all the fractional integrals of the first kind of order *α* that have been previously introduced in the literature by various authors. One can similarly derive the corresponding results on fractional integrals of the first kind in the real matrix-variate case. Before concluding this section, we will consider one more special case. Let

$$\phi\_1(X\_1) = |X\_1|^{-\alpha - \frac{p+1}{2}} \text{ and } \phi\_2(X\_2) = |X\_2|^\alpha.$$

In this case, *g*1*(U*1*)* is not a statistical density but it is the M-convolution of a ratio. Under the above substitutions, *g*1*(U*1*)* of (5.7.9) becomes

$$\log\_1(U\_1) = \frac{1}{\Gamma\_p(a)} \int\_{V < U\_1} |U\_1 - V|^{a - \frac{p+1}{2}} f(V)dV,\ \Re(a) > \frac{p-1}{2}.\tag{5.7.12}$$

For *p* = 1, (5.7.12) is Weyl fractional integral of the first kind of order *α*; accordingly the first author refers to (5.7.12) as Weyl fractional integral of the first kind of order *α* in the real matrix-variate case. Since we are considering only real positive definite matrices here, there is a natural lower bound for the integral or the integral is over *O<V <U*1. When there is a specific lower bound, such as *O<V* , then for *p* = 1, (5.7.12) is called the Riemann-Liouville fractional integral of the first kind of order *α*. Hence (5.7.12) will be referred to as the Riemann-Liouville fractional integral of the first kind of order *α* in the real matrix-variate case.

**Example 5.7.1.** Let *X*<sup>1</sup> and *X*<sup>2</sup> be independently distributed *p* × *p* real positive definite gamma matrix-variate random variables whose densities are

$$f\_j(X\_j) = \frac{1}{\Gamma\_p(\alpha\_j)} |X\_j|^{\alpha\_j - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(X\_j)}, \ X\_j > O,\ \Re(\alpha\_j) > \frac{p-1}{2}, \ j = 1,2,3$$

and zero elsewhere. Show that the densities of the symmetric ratios of matrices *U*<sup>1</sup> = *X*−<sup>1</sup> 2 <sup>1</sup> *<sup>X</sup>*2*X*−<sup>1</sup> 2 <sup>1</sup> and *U*<sup>2</sup> = *X* 1 2 <sup>2</sup> *<sup>X</sup>*−<sup>1</sup> <sup>1</sup> *X* 1 2 <sup>2</sup> are identical.

**Solution 5.7.1.** Observe that for *p* = 1 that is, in the real scalar case, both *U*<sup>1</sup> and *U*<sup>2</sup> are the ratio of real scalar variables *<sup>x</sup>*<sup>2</sup> *x*1 but in the matrix-variate case *U*<sup>1</sup> and *U*<sup>2</sup> are different matrices. Hence, we cannot expect the densities of *U*<sup>1</sup> and *U*<sup>2</sup> to be the same. They will happen to be identical because of a property called functional symmetry of the gamma densities. Consider *U*<sup>1</sup> and let *V* = *X*1. Then, *X*<sup>2</sup> = *V* 1 <sup>2</sup>*U*1*V* 1 <sup>2</sup> and d*X*<sup>1</sup> ∧ d*X*<sup>2</sup> = |*V* | *p*+1 <sup>2</sup> d*V* ∧ d*U*1. Due to the statistical independence of *X*<sup>1</sup> and *X*2, their joint density is *f*1*(X*1*)f*2*(X*2*)* and the joint density of *U*<sup>1</sup> and *V* is |*V* | *p*+1 <sup>2</sup> *f*1*(V )f*2*(V* 1 <sup>2</sup>*U*1*V* 1 <sup>2</sup> *)*, the marginal density of *U*1, denoted by *g*1*(U*1*)*, being the following:

$$g\_1(U\_1) = \frac{|U\_1|^{\alpha\_2 - \frac{p+1}{2}}}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} \int\_{V>0} |V|^{\alpha\_1 + \alpha\_2 - \frac{p+1}{2}} \mathbf{e}^{-\operatorname{tr}(V + V^{\frac{1}{2}}U\_1V^{\frac{1}{2}})} \mathrm{d}V.$$

The exponent of e can be written as follows:

$$-\text{tr}(V) - \text{tr}(V^{\frac{1}{2}}U\_1V^{\frac{1}{2}}) = -\text{tr}(V) - \text{tr}(VU\_1) = -\text{tr}(V(I+U\_1)) = -\text{tr}[(I+U\_1)^{\frac{1}{2}}V(I+U\_1)^{\frac{1}{2}}].$$

Letting *Y* = *(I* + *U*1*)* 1 <sup>2</sup>*V (I* + *U*1*)* 1 <sup>2</sup> ⇒ d*Y* = |*I* + *U*1| *p*+1 <sup>2</sup> d*V.* Then carrying out the integration in *g*1*(U*1*),* we obtain the following density function:

$$g\_1(U\_1) = \frac{\Gamma\_p(\alpha\_1 + \alpha\_2)}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} |U\_1|^{\alpha\_2 - \frac{p+1}{2}} |I + U\_1|^{-(\alpha\_1 + \alpha\_2)},\tag{i}$$

which is a real matrix-variate type-2 beta density with the parameters*(α*2*, α*1*)*. The original conditions *(αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,* 2*,* remain the same, no additional conditions being needed. Now, consider *U*<sup>2</sup> and let *V* = *X*<sup>2</sup> so that *X*<sup>1</sup> = *V* 1 <sup>2</sup>*U*−<sup>1</sup> <sup>2</sup> *V* 1 <sup>2</sup> ⇒ d*X*<sup>1</sup> ∧ d*X*<sup>2</sup> = |*V* | *p*+1 <sup>2</sup> |*U*2| −*(p*+1*)* d*V* ∧ d*U*2. The marginal density of *U*<sup>2</sup> is then:

$$g\_2(U\_2) = \frac{|U\_2|^{-\alpha\_1 + \frac{p+1}{2}} |U\_2|^{-(p+1)}}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} \int\_{V>0} |V|^{\alpha\_1 + \alpha\_2 - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}[V + V^{\frac{1}{2}}U\_2^{-1}V^{\frac{1}{2}}]} \qquad (\text{i}\text{)}$$

As previously explained, the exponent in *(ii)* can be simplified to <sup>−</sup>tr[*(I* <sup>+</sup> *<sup>U</sup>*−<sup>1</sup> <sup>2</sup> *)* 1 <sup>2</sup>*V (I* + *U*−<sup>1</sup> <sup>2</sup> *)* 1 <sup>2</sup> ], which once integrated out yields *Γp(α*<sup>1</sup> <sup>+</sup> *<sup>α</sup>*2*)*|*<sup>I</sup>* <sup>+</sup> *<sup>U</sup>*−<sup>1</sup> 2 | −*(α*1+*α*2*)* . Then,

$$|U\_2|^{-a\_1 + \frac{p+1}{2}} |U\_2|^{-(p+1)} |I + U\_2^{-1}|^{-(a\_1 + a\_2)} = |U\_2|^{a\_2 - \frac{p+1}{2}} |I + U\_2|^{-(a\_1 + a\_2)}.\tag{iii}$$

It follows from (*i*),(*ii*) and *(iii)* that *g*1*(U*1*)* = *g*2*(U*2*)*. Thus, the densities of *U*<sup>1</sup> and *U*<sup>2</sup> are indeed one and the same, as had to be proved.

#### **5.7.5. A pathway extension of first kind integrals, real matrix-variate case**

As in the case of fractional integral of the second kind, we can also construct a pathway extension of the first kind integrals in the real matrix-variate case. Let

$$f\_1(X\_1) = \frac{\phi\_1(X\_1)}{\Gamma\_p(a)} |I - a(1 - q)X\_1|^{a - \frac{p + 1}{2}}, \ a = \frac{\eta}{1 - q}, \ \Re(a) > \frac{p - 1}{2}, \tag{5.7.13}$$

and *f*2*(X*2*)* = *φ*2*(X*2*)f (X*2*)* for the scalar parameters *a >* 0*,η>* 0*,q<* 1. When *q <* 1, (5.7.13) remains in the generalized type-1 beta family of functions. However, when *q >* 1*, f*<sup>1</sup> switches to the generalized type-2 beta family of functions and when *q* → 1, (5.6.13) goes into a gamma family of functions. Since *X*<sup>1</sup> *> O* for *q >* 1 and *q* → 1, the fractional nature is lost in those instances. Hence, only the case *q <* 1 is relevant in this subsection.

For various values of *q <* 1, one has a family of fractional integrals of the first kind coming from (5.7.13). For details on the concept of pathway, the reader may refer to Mathai (2005) and later papers. With the function *f*1*(X*1*)* as specified in (5.7.13) and the corresponding *f*2*(X*2*)* = *φ*2*(X*2*)f (X*2*)*, one can write down the M-convolution of a ratio, *g*1*(U*1*)*, corresponding to (5.7.8). Thus, we have the pathway extended form of *g*1*(U*1*)*.

#### **5.7a. Density of a Product and Integrals of the Second Kind**

The discussion in this section parallels that in the real matrix-variate case. Hence, only a summarized treatment will be provided. With respect to the density of a product when *f*˜ <sup>1</sup> and *f*˜ <sup>2</sup> are matrix-variate gamma densities in the complex domain, the results are parallel to those obtained in the real matrix-variate case. Hence, we will consider an extension of fractional integrals to the complex matrix-variate cases. Matrices in the complex domain will be denoted with a tilde. Let *X*˜ <sup>1</sup> and *X*˜ <sup>2</sup> be independently distributed Hermitian positive definite complex matrix-variate random variables whose densities are *f*˜ <sup>1</sup>*(X*˜ <sup>1</sup>*)* and *f*˜ <sup>2</sup>*(X*˜ <sup>2</sup>*),* respectively. Let *U*˜<sup>2</sup> = *X*˜ 1 2 <sup>2</sup> *X*˜ <sup>1</sup>*X*˜ 1 2 <sup>2</sup> and *U*˜<sup>1</sup> = *X*˜ 1 2 <sup>2</sup> *<sup>X</sup>*˜ <sup>−</sup><sup>1</sup> <sup>1</sup> *X*˜ 1 2 <sup>2</sup> where *X*˜ 1 2 <sup>2</sup> denotes the Hermitian positive definite square root of the Hermitian positive definite matrix *X*˜ 2. Statistical densities are real-valued scalar functions whether the argument matrix is in the real or complex domain.

#### **5.7a.1. Density of a product and fractional integral of the second kind, complex case**

Let us consider the transformation *(X*˜ <sup>1</sup>*, X*˜ <sup>2</sup>*)* → *(U*˜2*, V )*˜ and *(X*˜ <sup>1</sup>*, X*˜ <sup>2</sup>*)* → *(U*˜1*, V )*˜ , the Jacobians being available from Chap. 1 or Mathai (1997). Then,

$$\mathrm{d}\tilde{X}\_1 \wedge \mathrm{d}\tilde{X}\_2 = \begin{cases} |\mathrm{det}(\tilde{V})|^{-p} \mathrm{d}\tilde{U}\_2 \wedge \mathrm{d}\tilde{V} \\ |\mathrm{det}(\tilde{V})|^p |\mathrm{det}(\tilde{U}\_1)|^{-2p} \mathrm{d}\tilde{U}\_1 \wedge \mathrm{d}\tilde{V} .\end{cases} \tag{5.7a.1}$$

When *f*<sup>1</sup> and *f*<sup>2</sup> are statistical densities, the density of the product, denoted by *g*˜2*(U*˜2*)*, is the following:

$$\tilde{\mathbf{g}}\_2(\tilde{U}\_2) = \int\_{\tilde{V}} |\det(\tilde{V})|^{-p} f\_1(\tilde{V}^{-\frac{1}{2}} \tilde{U}\_2 \tilde{V}^{-\frac{1}{2}}) f\_2(\tilde{V}) \,\,\mathrm{d}\tilde{V} \tag{5.7a.2}$$

where |det*(*·*)*| is the absolute value of the determinant of *(*·*)*. If *f*<sup>1</sup> and *f*<sup>2</sup> are not statistical densities, (5.7a.1) will be called the M-convolution of the product. As in the real matrixvariate case, we will give a general definition of a fractional integral of order *α* of the second kind in the complex matrix-variate case. Let

$$
\tilde{f}\_1(\tilde{X}\_1) = \phi\_1(\tilde{X}\_1) \frac{1}{\tilde{\Gamma}\_p(\alpha)} |\det(I - \tilde{X}\_1)|^{\alpha - p}, \ \Re(\alpha) > p - 1,
$$

and *f*2*(X*˜ <sup>2</sup>*)* = *φ*2*(X*˜ <sup>2</sup>*)f (*˜ *X*˜ <sup>2</sup>*)* where *φ*<sup>1</sup> and *φ*<sup>2</sup> are specified functions and *f* is an arbitrary function. Then, (5.7a.2) becomes

$$\begin{split} \tilde{g}\_2(\tilde{U}\_2) &= \int\_{\tilde{V}} |\det(\tilde{V})|^{-p} \phi\_1(\tilde{V}^{-\frac{1}{2}} \tilde{U}\_2 \tilde{V}^{-\frac{1}{2}}) \\ &\times \frac{1}{\tilde{I}\_p(\alpha)} |\det(I - \tilde{V}^{-\frac{1}{2}} \tilde{U}\_2 \tilde{V}^{-\frac{1}{2}})|^{\alpha - p} \phi\_2(\tilde{V}) f(\tilde{V}) \, \mathrm{d}\tilde{V} \end{split} \tag{5.7a.3}$$

for *(α) > p* − 1. As an example, let

$$\phi\_1(\tilde{X}\_1) = \frac{\Gamma\_p(\gamma + p + \alpha)}{\tilde{\Gamma}\_p(\gamma + p)} |\det(\tilde{X}\_1)|^\gamma \text{ and } \phi\_2(\tilde{X}\_2) = 1.$$

Observe that *f*˜ <sup>1</sup>*(X*˜ <sup>1</sup>*)* has now become a complex matrix-variate type-1 beta density with the parameters *(γ* + *p, α)* so that (5.7a.3) can be expressed as follows:

$$\begin{split} \tilde{g}\_{2}(\tilde{U}\_{2}) &= \frac{\tilde{\Gamma}\_{p}(\boldsymbol{\chi} + \boldsymbol{p} + \boldsymbol{a})}{\tilde{\Gamma}\_{p}(\boldsymbol{\chi} + \boldsymbol{p})} \frac{|\det(\tilde{U}\_{2})|^{\boldsymbol{\gamma}}}{\tilde{\Gamma}\_{p}(\boldsymbol{a})} \int\_{\tilde{V} > \tilde{U}\_{2} > 0} |\det(\tilde{V})|^{-a-\boldsymbol{\gamma}} |\det(\tilde{V} - \tilde{U}\_{2})|^{a-\boldsymbol{p}} \tilde{f}(\boldsymbol{\tilde{V}}) \, \mathrm{d}\tilde{V} \\ &= \frac{\tilde{\Gamma}\_{p}(\boldsymbol{\chi} + \boldsymbol{p} + \boldsymbol{a})}{\tilde{\Gamma}\_{p}(\boldsymbol{\chi} + \boldsymbol{p})} \tilde{K}^{-a}\_{2, \tilde{U}\_{2}, \boldsymbol{\chi}} f \end{split} \tag{5.7a.4}$$

where

$$\tilde{K}\_{2,\tilde{U}\_{2},\mathcal{V}}^{-a}f = \frac{|\det(\tilde{U}\_{2})|^{\mathcal{V}}}{\tilde{I}\_{p}(a)} \int\_{\tilde{V} > \tilde{U}\_{2} > O} |\det(\tilde{V})|^{-a-\mathcal{V}} |\det(\tilde{V} - \tilde{U}\_{2})|^{a-p} f(\tilde{V}) \,\mathrm{d}\tilde{V} \quad (5.7a.5)$$

is Erdelyi-Kober fractional integral of the second kind of order ´ *α* in the complex matrixvariate case, which is defined for *(α) > p* − 1*, (γ ) >* −1. The extension of fractional integrals to complex matrix-variate cases was introduced in Mathai (2013). As a second example, let

$$\phi\_1(\tilde{X}\_1) = 1 \text{ and } \phi\_2(\tilde{X}\_2) = |\det(\tilde{V})|^\alpha.$$

In that case, (5.7a.3) becomes

$$\tilde{g}\_2(\tilde{U}\_2) = \int\_{\tilde{V} > \tilde{U}\_2 > O} |\det(\tilde{V} - \tilde{U}\_2)|^{\alpha - p} f(V) \, \mathrm{d}\tilde{V}, \,\,\Re(\alpha) > p - 1. \tag{5.7a.6}$$

The integral (5.7a.6) is Weyl fractional integral of the second kind of order *α* in the complex matrix-variate case. If *V* is bounded above by a Hermitian positive definite constant matrix *B>O*, then (5.7a.6) is a Riemann-Liouville fractional integral of the second kind of order *α* in the complex matrix-variate case.

A pathway extension parallel to that developed in the real matrix-variate case can be similarly obtained. Accordingly, the details of the derivation are omitted.

#### **5.7a.2. Density of a ratio and fractional integrals of the first kind, complex case**

We will now derive the density of the symmetric ratio *U*˜<sup>1</sup> defined in Sect. 5.7a. If *f*˜ 1 and *f*˜ <sup>2</sup> are statistical densities, then the density of *U*˜1, denoted by *g*˜1*(U*˜1*)*, is given by

$$\tilde{\mathbf{g}}\_1(\tilde{U}\_1) = \int\_{\tilde{V}} |\det(\tilde{V})|^p |\det(\tilde{U}\_1)|^{-2p} \, \tilde{f}\_1(\tilde{V}^{\frac{1}{2}}\tilde{U}^{-1}\tilde{V}^{\frac{1}{2}}) \tilde{f}\_2(\tilde{V}) \, \mathrm{d}\tilde{V},\tag{5.7a.7}$$

provided the integral is convergent. For the general definition, let us take

$$
\tilde{f}\_1(\tilde{X}\_1) = \phi\_1(\tilde{X}\_1) \frac{1}{\tilde{\Gamma}\_p(\alpha)} |\det(I - \tilde{X}\_1)|^{\alpha - p}, \ \Re(\alpha) > p - 1,
$$

and *f*˜ <sup>2</sup>*(X*˜ <sup>2</sup>*)* = *φ*2*(X*˜ <sup>2</sup>*)f (*˜ *X*˜ <sup>2</sup>*)* where *φ*<sup>1</sup> and *φ*<sup>2</sup> are specified functions and *f*˜ is an arbitrary function. Then *g*˜1*(U*˜1*)* is the following:

$$
\begin{split}
\tilde{g}\_{1}(\tilde{U}\_{1}) &= \int\_{\tilde{V}} |\det(\tilde{V})|^{p} |\det(\tilde{U}\_{1})|^{-2p} \frac{1}{\tilde{I}\_{p}(\alpha)} \phi\_{1}(\tilde{V}^{\frac{1}{2}}\tilde{U}\_{1}^{-1}\tilde{V}^{\frac{1}{2}}) \\ &\times |\det(I - \tilde{V}^{\frac{1}{2}}\tilde{U}\_{1}^{-1}\tilde{V}^{\frac{1}{2}})|^{\alpha - p} \phi\_{2}(\tilde{V}) f(\tilde{V}) \,\,\mathrm{d}\tilde{V}.\end{split} \tag{5.7a.8}
$$

As an example, let

$$\phi\_1(\tilde{X}\_1) = \frac{\Gamma\_p(\nu + \alpha)}{\tilde{\Gamma}\_p(\nu)} |\det(\tilde{X}\_1)|^{\nu - p}$$

and *φ*<sup>2</sup> = 1. Then,

*<sup>g</sup>*˜1*(U*˜1*)* <sup>=</sup> *<sup>Γ</sup>*˜ *p(γ* + *α) Γ*˜ *p(γ )* |det*(U*˜1*)*| −*α*−*γ Γ*˜ *p(α)* \$ *O<V <*˜ *U*˜1 |det*(V )*˜ | *γ* × |det*(U*˜<sup>1</sup> − *V )*˜ | *<sup>α</sup>*−*pf (V )*˜ <sup>d</sup>*V ,* ˜ *(α) > p* <sup>−</sup> <sup>1</sup> = *Γ*˜ *p(γ* + *α) Γ*˜ *p(γ ) K*−*<sup>α</sup>* <sup>1</sup>*,U*˜1*,γ <sup>f</sup>* (5.7a.9)

where

$$K\_{1,\tilde{U}\_1,\chi}^{-a}f = \frac{|\det(\tilde{U}\_1)|^{-a-\gamma}}{\tilde{\Gamma}\_p(a)} \int\_{V < \tilde{U}\_1} |\det(\tilde{V})|^{\gamma} |\det(\tilde{U}\_1 - \tilde{V})|^{a-p} f(\tilde{V}) \, \mathrm{d}\tilde{V} \tag{5.7a.10}$$

for *(α) > p* − 1, is the Erdelyi-Kober fractional integral of the first kind of order ´ *α* and parameter *γ* in the complex matrix-variate case. We now consider a second example. On letting *φ*1*(X*˜ <sup>1</sup>*)* = |det*(X*˜ <sup>1</sup>*)*| <sup>−</sup>*α*−*<sup>p</sup>* and *<sup>φ</sup>*2*(X*˜ <sup>2</sup>*)* = |det*(X*˜ <sup>2</sup>*)*<sup>|</sup> *<sup>α</sup>*, the density of *U*˜<sup>1</sup> is

$$\tilde{g}\_1(\tilde{U}\_1) = \frac{1}{\tilde{\Gamma}\_p(\alpha)} \int\_{\tilde{V} < \tilde{U}\_1} |\det(\tilde{U}\_1 - \tilde{V})|^{\alpha - p} f(\tilde{V}) d\tilde{V},\ \Re(\alpha) > p - 1. \tag{5.7a.11}$$

The integral in (5.7a.11) is Weyl's fractional integral of the first kind of order *α* in the complex matrix-variate case, denoted by *W*˜ <sup>−</sup>*<sup>α</sup>* 1*,U*˜1 *f* . Observe that we are considering only Hermitian positive definite matrices. Thus, there is a lower bound, the integral being over *O < V <*˜ *U*˜1. Hence (5.7a.11) can also represent a Riemann-Liouville fractional integral of the first kind of order *α* in the complex matrix-variate case with a null matrix as its lower bound. For fractional integrals involving several matrices and fractional differential operators for functions of matrix argument, refer to Mathai (2014a, 2015); for pathway extensions, see Mathai and Haubold (2008, 2011).

#### **Exercises 5.7**

All the matrices appearing herein are *p* × *p* real positive definite, when real, and Hermitian positive definite, when in the complex domain. The M-transform of a real-valued scalar function *f (X)* of the *p* × *p* real matrix *X*, with the M-transform parameter *ρ*, is defined as

$$M\_f(\rho) = \int\_{X > O} |X|^{\rho - \frac{p+1}{2}} f(X) \, \mathrm{d}X,\ \Re(\rho) > \frac{p-1}{2},$$

whenever the integral is convergent. In the real case, the M-convolution of a product *U*<sup>2</sup> = *X* 1 2 <sup>2</sup> *X*1*X* 1 2 <sup>2</sup> with the corresponding functions *f*1*(X*1*)* and *f*2*(X*2*)*, respectively, is

$$\mathcal{g}\_2(U\_2) = \int\_V |V|^{-\frac{p+1}{2}} f\_1(V^{-\frac{1}{2}}U\_2V^{-\frac{1}{2}}) f\_2(V) \,\mathrm{d}V$$

whenever the integral is convergent. The M-convolution of a ratio in the real case is *g*1*(U*1*)*. The M-convolution of a product and a ratio in the complex case are *g*˜2*(U*˜2*)* and *g*˜1*(U*˜1*)*, respectively, as defined earlier in this section. If *α* is the order of a fractional integral operator operating on *f* , denoted by *A*−*αf* , then the semigroup property is that *<sup>A</sup>*−*αA*−*βf* <sup>=</sup> *<sup>A</sup>*−*(α*+*β)f* <sup>=</sup> *<sup>A</sup>*−*βA*−*αf* .

**5.7.1.** Show that the M-transform of the M-convolution of a product is the product of the M-transforms of the individual functions *f*<sup>1</sup> and *f*2, both in the real and complex cases.

**5.7.2.** What are the M-transforms of the M-convolution of a ratio in the real and complex cases? Establish your assertions.

**5.7.3.** Show that the semigroup property holds for Weyl's fractional integral of the (1): first kind, (2): second kind, in the real matrix-variate case.

**5.7.4.** Do (1) and (2) of Exercise 5.7.3 hold in the complex matrix-variate case? Prove your assertion.

**5.7.5.** Evaluate the M-transforms of the Erdelyi-Kober fractional integral of order ´ *α* of (1): the first kind, (2): the second kind and state the conditions for their existence.

**5.7.6.** Repeat Exercise 5.7.5 for (1) Weyl's fractional integral of order *α*, (2) the Riemann-Liouville fractional integral of order *α*.

**5.7.7.** Evaluate the Weyl fractional integral of order *α* of (a): the first kind, (b): the second kind, in the real matrix-variate case, if possible, if the arbitrary function is (1): e−tr*(X)*, (2): etr*(X)* and write down the conditions wherever it is evaluated.

**5.7.8.** Repeat Exercise 5.7.7 for the complex matrix-variate case.

**5.7.9.** Evaluate the Erdelyi-Kober fractional integral of order ´ *α* and parameter *γ* of the (a): first kind, (b): second kind, in the real matrix-variate case, if the arbitrary function is (1): |*X*| *<sup>δ</sup>*, (2): <sup>|</sup>*X*<sup>|</sup> <sup>−</sup>*δ*, wherever possible, and write down the necessary conditions.

**5.7.10.** Repeat Exercise 5.7.9 for the complex case. In the complex case, |*X*| = determinant of *X*, is to be replaced by | det*(X)*˜ |, the absolute value of the determinant of *X*˜ .

#### **5.8. Densities Involving Several Matrix-variate Random Variables, Real Case**

We will start with real scalar variables. The most popular multivariate distribution, apart from the normal distribution, is the Dirichlet distribution, which is a generalization of the type-1 and type-2 beta distributions.

#### **5.8.1. The type-1 Dirichlet density, real scalar case**

Let *x*1*,...,xk* be real scalar random variables having a joint density of the form

$$f\_1(\mathbf{x}\_1, \dots, \mathbf{x}\_k) = c\_k \mathbf{x}\_1^{a\_1 - 1} \cdots \mathbf{x}\_k^{a\_k - 1} (1 - \mathbf{x}\_1 - \dots - \mathbf{x}\_k)^{a\_{k+1} - 1} \tag{5.8.1}$$

for *ω* = {*(x*1*,...,xk)*|0 ≤ *xj* ≤ 1*, j* = 1*,... , k,* 0 ≤ *x*<sup>1</sup> +···+ *xk* ≤ 1}, *(αj ) >* 0*, j* = 1*,...,k* + 1 and *f*<sup>1</sup> = 0 elsewhere. This is type-1 Dirichlet density where *ck* is the normalizing constant. Note that *ω* describes a simplex and hence the support of *f*<sup>1</sup> is the simplex *ω*. Evaluation of the normalizing constant can be achieved in differing ways. One method relies on the direct integration of the variables, one at a time. For example, integration over *x*<sup>1</sup> involves two factors *xα*1−<sup>1</sup> <sup>1</sup> and *(*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*<sup>1</sup> −···− *xk)αk*+1<sup>−</sup>1. Let *<sup>I</sup>*<sup>1</sup> be the integral over *x*1. Observe that *x*<sup>1</sup> varies from 0 to 1 − *x*<sup>2</sup> −···− *xk*. Then

$$I\_{\mathbf{l}} = \int\_{\underline{\mathbf{x}}\_{\mathbf{l}}=\mathbf{0}}^{\mathbf{l}-\underline{\mathbf{x}}\_{2}-\cdots-\underline{\mathbf{x}}\_{\mathbf{k}}} x\_{\mathbf{l}}^{\alpha\_{\mathbf{l}}-\mathbf{l}} (1 - x\_{\mathbf{l}} - \cdots - x\_{\mathbf{k}})^{\alpha\_{\mathbf{k}+\mathbf{l}}-\mathbf{l}} d\mathbf{x}\_{\mathbf{l}}.$$

But

$$(1 - x\_1 - \dots - x\_k)^{a\_{k+1}-1} = (1 - x\_2 - \dots - x\_k)^{a\_{k+1}-1} \left[ 1 - \frac{x\_1}{1 - x\_2 - \dots - x\_k} \right]^{a\_{k+1}-1}.$$

Make the substitution *<sup>y</sup>* <sup>=</sup> *<sup>x</sup>*<sup>1</sup> 1−*x*2−···−*xk* ⇒ d*x*<sup>1</sup> = *(*1 − *x*<sup>2</sup> −···− *xk)*d*y*, which enable one to integrate out *y* by making use of a real scalar type-1 beta integral giving

$$\int\_0^1 \mathbf{y}^{\alpha\_1 - 1} (1 - \mathbf{y})^{\alpha\_{k+1} - 1} \mathbf{d} \mathbf{y} = \frac{\Gamma(\alpha\_1) \Gamma(\alpha\_{k+1})}{\Gamma(\alpha\_1 + \alpha\_{k+1})}$$

for *(α*1*) >* 0*, (αk*+1*) >* 0. Now, proceed similarly by integrating out *x*<sup>2</sup> from *xα*2−<sup>1</sup> <sup>2</sup> *(*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> − ··· − *xk)α*1+*αk*+<sup>1</sup> , and continue in this manner until *xk* is reached. Finally, after canceling out all common gamma factors, one has *Γ (α*1*)*··· *Γ (αk*+1*)/Γ (α*<sup>1</sup> + ···+ *αk*+1*)* for *(αj ) >* 0*, j* = 1*,...,k* + 1. Thus, the normalizing constant is given by

$$c\_k = \frac{\Gamma(\alpha\_1 + \dots + \alpha\_{k+1})}{\Gamma(\alpha\_1) \dots \Gamma(\alpha\_{k+1})}, \ \Re(\alpha\_j) > 0, \ j = 1, \dots, k+1. \tag{5.8.2}$$

Another method for evaluating the normalizing constant *ck* consists of making the following transformation:

$$\begin{aligned} \mathbf{x}\_1 &= \mathbf{y}\_1\\ \mathbf{x}\_2 &= \mathbf{y}\_2(1 - \mathbf{y}\_1) \\ \mathbf{x}\_j &= \mathbf{y}\_j(1 - \mathbf{y}\_1)(1 - \mathbf{y}\_2) \cdots (1 - \mathbf{y}\_{j-1}), \ j = 2, \ldots, k \end{aligned} \tag{5.8.3}$$

It is then easily seen that

$$\mathbf{dx}\_1 \wedge \dots \wedge \mathbf{dx}\_k = (1 - \mathbf{y}\_1)^{k-1} (1 - \mathbf{y}\_2)^{k-2} \cdots (1 - \mathbf{y}\_{k-1}) \, \mathbf{dy}\_1 \wedge \dots \wedge \mathbf{dy}\_k. \tag{5.8.4}$$

Under this transformation, one has

$$\begin{aligned} 1 - x\_1 &= 1 - y\_1 \\ 1 - x\_1 - x\_2 &= (1 - y\_1)(1 - y\_2) \\ 1 - x\_1 - \dots - x\_k &= (1 - y\_1)(1 - y\_2) \dotsm (1 - y\_k) .\end{aligned}$$

Then, we have

$$\begin{split} & \mathbf{x}\_{1}^{\alpha\_{1}-1} \cdots \mathbf{x}\_{k}^{\alpha\_{k}-1} (1 - \mathbf{x}\_{1} - \cdots - \mathbf{x}\_{k})^{\alpha\_{k+1}-1} \mathbf{dx}\_{1} \wedge \ldots \wedge \mathbf{dx}\_{k} \\ & \qquad = \mathbf{y}\_{1}^{\alpha\_{1}-1} \cdots \mathbf{y}\_{k}^{\alpha\_{k}-1} (1 - \mathbf{y}\_{1})^{\alpha\_{2} + \cdots + \alpha\_{k+1} - 1} \\ & \qquad \quad \times (1 - \mathbf{y}\_{2})^{\alpha\_{3} + \cdots + \alpha\_{k+1} - 1} \cdots (1 - \mathbf{y}\_{k})^{\alpha\_{k+1}-1} \mathbf{dy}\_{1} \wedge \ldots \wedge \mathbf{dy}\_{k}. \end{split}$$

Now, all the *yj* 's are separated and each one can be integrated out by making use of a type-1 beta integral. For example, the integrals over *y*1*, y*2*,...,yk* give the following:

$$\begin{split} \int\_{0}^{1} \mathbf{y}\_{1}^{\alpha\_{1}-1} (1 - \mathbf{y}\_{1})^{\alpha\_{2} + \cdots + \alpha\_{k+1} - 1} \, \mathrm{d}\mathbf{y}\_{1} &= \frac{\Gamma(\alpha\_{1}) \Gamma(\alpha\_{2} + \cdots + \alpha\_{k+1})}{\Gamma(\alpha\_{1} + \cdots + \alpha\_{k+1})} \\ \int\_{0}^{1} \mathbf{y}\_{2}^{\alpha\_{2}-1} (1 - \mathbf{y}\_{2})^{\alpha\_{3} + \cdots + \alpha\_{k+1} - 1} \, \mathrm{d}\mathbf{y}\_{2} &= \frac{\Gamma(\alpha\_{2}) \Gamma(\alpha\_{3} + \cdots + \alpha\_{k+1})}{\Gamma(\alpha\_{2} + \cdots + \alpha\_{k+1})} \\ &\vdots \\ \int\_{0}^{1} \mathbf{y}\_{k}^{\alpha\_{k}-1} (1 - \mathbf{y}\_{k})^{\alpha\_{k+1}-1} \mathrm{d}\mathbf{y}\_{k} &= \frac{\Gamma(\alpha\_{k}) \Gamma(\alpha\_{k+1})}{\Gamma(\alpha\_{k} + \alpha\_{k+1})} \end{split}$$

for *(αj ) >* <sup>0</sup>*, j* <sup>=</sup> <sup>1</sup>*,...,k* <sup>+</sup> 1. Taking the product produces *<sup>c</sup>*−<sup>1</sup> *<sup>k</sup>* .

#### **5.8.2. The type-2 Dirichlet density, real scalar case**

Let *x*<sup>1</sup> *>* 0*,...,xk >* 0 be real scalar random variables having the joint density

$$f\_2(\mathbf{x}\_1, \dots, \mathbf{x}\_k) = c\_k \mathbf{x}\_1^{a\_1 - 1} \cdots \mathbf{x}\_k^{a\_k - 1} (1 + \mathbf{x}\_1 + \dots + \mathbf{x}\_k)^{-(a\_1 + \dots + a\_{k+1})} \tag{5.8.5}$$

for *xj >* 0*, j* = 1*,... , k, (αj ) >* 0*, j* = 1*,...,k* + 1 and *f*<sup>2</sup> = 0 elsewhere, where *ck* is the normalizing constant. This density is known as a type-2 Dirichlet density. It can be shown that the normalizing constant *ck* is the same as the one obtained in (5.8.2) for the type-1 Dirichlet distribution. This can be established by integrating out the variables one at a time, starting with *xk* or *x*1. This constant can also be evaluated with the help of the following transformation:

$$\begin{aligned} \mathbf{x}\_1 &= \mathbf{y}\_1 \\ \mathbf{x}\_2 &= \mathbf{y}\_2 (1 + \mathbf{y}\_1) \\ \mathbf{x}\_j &= \mathbf{y}\_j (1 + \mathbf{y}\_1)(1 + \mathbf{y}\_2) \cdots (1 + \mathbf{y}\_{j-1}), \ j = 2, \ldots, k, \end{aligned} \tag{5.8.6}$$

whose Jacobian is given by

$$\mathrm{d}\mathbf{x}\_{\mathrm{l}} \wedge \ldots \wedge \mathrm{d}\mathbf{x}\_{\mathrm{k}} = (\mathrm{l} + \mathrm{y}\_{\mathrm{l}})^{\mathrm{k}-1} (\mathrm{l} + \mathrm{y}\_{\mathrm{2}})^{\mathrm{k}-2} \cdots (\mathrm{l} + \mathrm{y}\_{\mathrm{k}-1}) \,\mathrm{d}\mathbf{y}\_{\mathrm{l}} \wedge \ldots \wedge \mathrm{d}\mathbf{y}\_{\mathrm{k}}.\tag{5.8.7}$$

#### **5.8.3. Some properties of Dirichlet densities in the real scalar case**

Let us determine the *h*-th moment of *(*1 − *x*<sup>1</sup> −···− *xk)* in a type-1 Dirichlet density:

$$E[1 - \mathbf{x}\_1 - \dots - \mathbf{x}\_k]^h = \int\_{\boldsymbol{\omega}} (1 - \mathbf{x}\_1 - \dots - \mathbf{x}\_k)^h f\_1(\mathbf{x}\_1, \dots, \mathbf{x}\_k) \, \mathrm{d}\mathbf{x}\_1 \wedge \dots \wedge \mathbf{d}\mathbf{x}\_k.$$

In comparison with the total integral, the only change is that the parameter *αk*+<sup>1</sup> is replaced by *αk*+<sup>1</sup> + *h*; thus the result is available from the normalizing constant. That is,

$$E[1 - \mathbf{x}\_1 - \dots - \mathbf{x}\_k]^h = \frac{\Gamma(\alpha\_{k+1} + h)}{\Gamma(\alpha\_{k+1})} \frac{\Gamma(\alpha\_1 + \dots + \alpha\_{k+1})}{\Gamma(\alpha\_1 + \dots + \alpha\_{k+1} + h)}.\tag{5.8.8}$$

The additional condition needed is *(αk*+<sup>1</sup> + *h) >* 0. Considering the structure of the moment in (5.8.8), *u* = 1 − *x*<sup>1</sup> −···− *xk* is manifestly a real scalar type-1 beta variable with the parameters *(αk*+1*, α*<sup>1</sup> +···+ *αk)*. This is stated in the following result:

**Theorem 5.8.1.** *Let x*1*,...,xk have a real scalar type-1 Dirichlet density with the parameters (α*1*,..., αk*; *αk*+1*). Then, u* = 1 − *x*<sup>1</sup> −···− *xk has a real scalar type-1 beta distribution with the parameters (αk*+1*, α*<sup>1</sup> +···+ *αk), and* 1 − *u* = *x*<sup>1</sup> +···+ *xk has a real scalar type-1 beta distribution with the parameters (α*<sup>1</sup> +···+ *αk, αk*+1*).*

Some parallel results can also be obtained for type-2 Dirichlet variables. Consider a real scalar type-2 Dirichlet density with the parameters *(α*1*,...,αk*; *αk*+1*)*. Let *v* = *(*<sup>1</sup> <sup>+</sup> *<sup>x</sup>*<sup>1</sup> +···+ *xk)*−1. Then, when taking the *<sup>h</sup>*-th moment of *<sup>v</sup>*, that is *<sup>E</sup>*[*vh*], we see that the only change is that *αk*+<sup>1</sup> becomes *αk*+<sup>1</sup> + *h*. Accordingly, *v* has a real scalar type-1 beta distribution with the parameters *(αk*+1*, α*<sup>1</sup> +···+ *αk)*. Thus, 1 <sup>−</sup> *<sup>v</sup>* <sup>=</sup> *<sup>x</sup>*1+···+*xk* 1+*x*1+···+*xk* is a type-1 beta random variables with the parameters interchanged. Hence the following result:

**Theorem 5.8.2.** *Let x*1*,...,xk have a real scalar type-2 Dirichlet density with the parameters (α*1*,...,αk*; *αk*+1*). Then <sup>v</sup>* <sup>=</sup> *(*1+*x*<sup>1</sup> +···+*xk)*−<sup>1</sup> *has a real scalar type-1 beta distribution with the parameters (αk*+1*, α*<sup>1</sup> +···+ *αk) and* <sup>1</sup> <sup>−</sup> *<sup>v</sup>* <sup>=</sup> *<sup>x</sup>*1+···+*xk* 1+*x*1+···+*xk has a real scalar type-1 beta distribution with the parameters (α*<sup>1</sup> +···+ *αk, αk*+1*).*

Observe that the joint product moments *E*[*x h*1 <sup>1</sup> ··· *x hk <sup>k</sup>* ] can be determined both in the cases of real scalar type-1 Dirichlet and type-2 Dirichlet densities. This can be achieved by considering the corresponding normalizing constants. Since an arbitrary product moment will uniquely determine the corresponding distribution, one can show that all subsets of variables from the set {*x*1*,...,xk*} are again real scalar type-1 Dirichlet and real scalar type-2 Dirichlet distributed, respectively; to identify the marginal joint density of a subset under consideration, it suffices to set the complementary set of *hj* 's equal to zero. Type-1 and type-2 Dirichlet densities enjoy many properties, some of which are mentioned in the exercises. As well, there exist several types of generalizations of the type-1 and type-2 Dirichlet models. The first author and his coworkers have developed several such models, one of which was introduced in connection with certain reliability problems.

#### **5.8.4. Some generalizations of the Dirichlet models**

Let the real scalar variables *x*1*,...,xk* have a joint density of the following type, which is a generalization of the type-1 Dirichlet density:

$$\begin{split} g\_1(\mathbf{x}\_1, \dots, \mathbf{x}\_k) &= b\_k \mathbf{x}\_1^{\alpha\_1 - 1} (1 - \mathbf{x}\_1)^{\beta\_1} \mathbf{x}\_2^{\alpha\_2 - 1} (1 - \mathbf{x}\_1 - \mathbf{x}\_2)^{\beta\_2} \cdots \\ &\quad \times \mathbf{x}\_k^{\alpha\_k - 1} (1 - \mathbf{x}\_1 - \dots - \mathbf{x}\_k)^{\beta\_k + a\_{k+1} - 1} \end{split} \tag{5.8.9}$$

for *(x*1*,...,xk)* ∈ *ω*, *(αj ) >* 0*, j* = 1*,...,k* + 1*,* as well as other necessary conditions to be stated later, and *g*<sup>1</sup> = 0 elsewhere, where *bk* denotes the normalizing constant. This normalizing constant can be evaluated by integrating out the variables one at a time or by making the transformation (5.8.3) and taking into account its associated Jacobian as specified in (5.8.4). Under the transformation (5.8.3), *y*1*,...,yk* will be independently distributed with *yj* having a type-1 beta density with the parameters *(αj , γj ), γj* = *αj*+<sup>1</sup> +···+ *αk*+<sup>1</sup> + *βj* +···+ *βk, j* = 1*,...,k*, which yields the normalizing constant

$$b\_k = \prod\_{j=1}^k \frac{\Gamma(\alpha\_j + \chi\_j)}{\Gamma(\alpha\_j)\Gamma(\chi\_j)}\tag{5.8.10}$$

for *(αj ) >* 0*, j* = 1*,...,k* + 1*, (γj ) >* 0*, j* = 1*,... , k,* where

$$\gamma\_j = \alpha\_{j+1} + \dots + \alpha\_{k+1} + \beta\_j + \dots + \beta\_k, \ j = 1, \dots, k. \tag{5.8.11}$$

Arbitrary moments *E*[*x h*1 <sup>1</sup> ··· *x hk <sup>k</sup>* ] are available from the normalizing constant *bk* by replacing *αj* by *αj* + *hj* for *j* = 1*,...,k* and then taking the ratio. It can be observed from this arbitrary moment that all subsets of the type *(x*1*,...,xj )* have a density of the type specified in (5.8.9). For other types of subsets, one has initially to rearrange the variables and the corresponding parameters by bringing them to the first *j* positions and then utilize the previous result on subsets.

The following model corresponding to (5.8.9) for the type-2 Dirichlet model was introduced by the first author:

$$\begin{split} g\_2(\mathbf{x}\_1, \dots, \mathbf{x}\_k) &= a\_k \mathbf{x}\_1^{a\_1 - 1} (1 + \mathbf{x}\_1)^{-\beta\_1} \mathbf{x}\_2^{a\_2 - 1} (1 + \mathbf{x}\_1 + \mathbf{x}\_2)^{-\beta\_2} \cdots \\ &\quad \times \mathbf{x}\_k^{a\_k - 1} (1 + \mathbf{x}\_1 + \dots + \mathbf{x}\_k)^{-(a\_1 + \dots + a\_{k+1}) - \beta\_k} \end{split} \tag{5.8.12}$$

for *xj >* 0*, j* = 1*,... , k, (αj ) >* 0*, j* = 1*,...,k* + 1*,* as well as other necessary conditions to be stated later, and *g*<sup>2</sup> = 0 elsewhere. In order to evaluate the normalizing constant *ak*, one can use the transformation (5.8.6) and its associated Jacobian given in (5.8.7). Then, *y*1*,...,yk* become independently distributed real scalar type-2 beta variables with the parameters *(αj , δj )*, where

$$\delta\_j = a\_1 + \dots + a\_{j-1} + a\_{k+1} + \beta\_j + \dots + \beta\_k \tag{5.8.13}$$

for *(αj ) >* 0*, j* = 1*,...,k* + 1*, (δj ) >* 0*, j* = 1*,...,k*. Other generalizations are available in the literature.

#### **5.8.5. A pseudo Dirichlet model**

In the type-1 Dirichlet model, the support is the previously described simplex *ω*. We will now consider a model, which was recently introduced by the first author, wherein the variables can vary freely in a hypercube. Let us begin with the case *k* = 2. Consider the model

$$\log\_{12}(\mathbf{x}\_1, \mathbf{x}\_2) = c\_{12} \mathbf{x}\_2^{a\_1} (1 - \mathbf{x}\_1)^{a\_1 - 1} (1 - \mathbf{x}\_2)^{a\_2 - 1} (1 - \mathbf{x}\_1 \mathbf{x}\_2)^{-(a\_1 + a\_2 - 1)}, \ 0 \le \mathbf{x}\_j \le 1,\tag{5.8.14}$$

for *(αj ) >* 0*, j* = 1*,* 2*,* and *g*<sup>12</sup> = 0 elsewhere. In this case, the variables are free to vary within the unit square. Let us evaluate the normalizing constant *c*12. For this purpose, let us expand the last factor by making use of the binomial expansion since 0 *< x*1*x*<sup>2</sup> *<* 1. Then,

$$(1 - x\_1 x\_2)^{-(a\_1 + a\_2 - 1)} = \sum\_{k=0}^{\infty} \frac{(a\_1 + a\_2 - 1)\_k}{k!} x\_1^k x\_2^k \tag{i}$$

where for example the Pochhmmer symbol *(a)k* stands for

$$a(a)\_k = a(a+1)\cdots(a+k-1), \ a \neq 0, \ (a)\_0 = 1.$$

Integral over *x*<sup>1</sup> gives

$$\int\_0^1 \mathbf{x}\_1^k (1 - \mathbf{x}\_1)^{a\_1 - 1} \mathbf{d} \mathbf{x}\_1 = \frac{\Gamma(k + 1)\Gamma(a\_1)}{\Gamma(a\_1 + k + 1)}, \ \Re(a\_1) > 0,\tag{ii}$$

and the integral over *x*<sup>2</sup> yields

$$\int\_0^1 \mathbf{x}\_2^{\alpha\_1+k} (1-\mathbf{x}\_2)^{\alpha\_2-1} \mathbf{dx}\_2 = \frac{\Gamma(\alpha\_1+k+1)\Gamma(\alpha\_2)}{\Gamma(\alpha\_1+\alpha\_2+1)}.\tag{iiii}$$

Taking the product of the right-hand side expressions in *(ii)* and *(iii)* and observing that *Γ (α*<sup>1</sup> + *α*<sup>2</sup> + *k* + 1*)* = *Γ (α*<sup>1</sup> + *α*<sup>2</sup> + 1*)(α*<sup>1</sup> + *α*<sup>2</sup> + 1*)k* and *Γ (k* + 1*)* = *(*1*)k*, we obtain the following total integral:

$$\begin{split} \frac{\Gamma(\alpha\_1)\Gamma(\alpha\_2)}{\Gamma(\alpha\_1+\alpha\_2+1)} &\sum\_{k=0}^{\infty} \frac{(1)\_k(\alpha\_1+\alpha\_2-1)\_k}{k!(\alpha\_1+\alpha\_2+1)\_k} \\ &= \frac{\Gamma(\alpha\_1)\Gamma(\alpha\_2)}{\Gamma(\alpha\_1+\alpha\_2+1)} 2^{\Gamma\_1(1,\alpha\_1+\alpha\_2-1;\alpha\_1+\alpha\_2+1;1)} \\ &= \frac{\Gamma(\alpha\_1)\Gamma(\alpha\_2)}{\Gamma(\alpha\_1+\alpha\_2+1)} \frac{\Gamma(\alpha\_1+\alpha\_2+1)\Gamma(1)}{\Gamma(\alpha\_1+\alpha\_2)\Gamma(2)} \\ &= \frac{\Gamma(\alpha\_1)\Gamma(\alpha\_2)}{\Gamma(\alpha\_1+\alpha\_2)}, \ \mathfrak{R}(\alpha\_1)>0, \ \mathfrak{R}(\alpha\_2)>0,\end{split} \tag{5.8.15}$$

where the <sup>2</sup>*F*<sup>1</sup> hypergeometric function with argument 1 is evaluated with the following identity:

$$\,\_2F\_1(a,b;c;1) = \frac{\Gamma(c)\Gamma(c-a-b)}{\Gamma(c-a)\Gamma(c-b)}\tag{5.8.16}$$

whenever the gamma functions are defined. Observe that (5.8.15) is a surprising result as it is the total integral coming from a type-1 real beta density with the parameters *(α*1*, α*2*)*. Now, consider the general model

$$\begin{split} g\_{lk}(\mathbf{x}\_{1},\ldots,\mathbf{x}\_{k}) &= c\_{1k}(1-\mathbf{x}\_{1})^{a\_{1}-1}\cdots(1-\mathbf{x}\_{k})^{a\_{k}-1}\mathbf{x}\_{2}^{a\_{1}}\cdots \\ &\times \mathbf{x}\_{k}^{a\_{1}+\cdots+a\_{k-1}}(1-\mathbf{x}\_{1}\ldots\mathbf{x}\_{k})^{-(a\_{1}+\cdots+a\_{k}-1)},\ 0 \le \mathbf{x}\_{j} \le 1,\ j=1,\ldots,k. \end{split} \tag{5.8.17}$$

Proceeding exactly as in the case of *k* = 2, one obtains the total integral as

$$\left[\left[c\_{1k}\right]\right]^{-1} = \frac{\Gamma(\alpha\_1)\dots\Gamma(\alpha\_k)}{\Gamma(\alpha\_1 + \dots + \alpha\_k)}, \ \Re(\alpha\_j) > 0, \ j = 1, \dots, k. \tag{5.8.18}$$

This is the total integral coming from a *(k* − 1*)*-variate real type-1 Dirichlet model. Some properties of this distribution are pointed out in some of the assigned problems.

#### **5.8.6. The type-1 Dirichlet model in real matrix-variate case**

Direct generalizations of the real scalar variable Dirichlet models to real as well as complex matrix-variate cases are possible. The type-1 model will be considered first. Let the *p* × *p* real positive definite matrices *X*1*,...,Xk* be such that *Xj > O, I* − *Xj > O*, that is *Xj* as well as *I* − *Xj* are positive definite, for *j* = 1*,... , k,* and, in addition, *I* − *X*<sup>1</sup> −···− *Xk > O*. Let *Ω* = {*(X*1*,...,Xk)*| *O<Xj < I, j* = 1*,... , k, I* − *X*<sup>1</sup> − ···− *Xk > O*}. Consider the model

$$\begin{split} G\_1(X\_1, \ldots, X\_k) &= C\_k |X\_1|^{a\_1 - \frac{p+1}{2}} \cdots |X\_k|^{a\_k - \frac{p+1}{2}} \\ &\quad \times |I - X\_1 - \cdots - X\_k|^{a\_{k+1} - \frac{p+1}{2}}, \ (X\_1, \ldots, X\_k) \in \mathcal{Q}, \end{split} \tag{5.8.19}$$

for *(αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,...,k* + 1*,* and *G*<sup>1</sup> = 0 elsewhere. The normalizing constant *Ck* can be determined by using real matrix-variate type-1 beta integrals to integrate the matrices one at the time. We can also evaluate the total integral by means of the following transformation:

$$\begin{aligned} X\_1 &= Y\_1\\ X\_2 &= (I - Y\_1)^{\frac{1}{2}} Y\_2 (I - Y\_1)^{\frac{1}{2}}\\ X\_j &= (I - Y\_1)^{\frac{1}{2}} \cdots (I - Y\_{j-1})^{\frac{1}{2}} Y\_j (I - Y\_{j-1})^{\frac{1}{2}} \cdots (I - Y\_1)^{\frac{1}{2}}, \; j = 2, \ldots, k. \end{aligned} \tag{5.8.20}$$

The associated Jacobian can then be determined by making use of results on matrix transformations that are provided in Sect. 1.6. Then,

$$\mathrm{d}X\_{1}\wedge\ldots\wedge\mathrm{d}X\_{k} = |I - Y\_{1}|^{(k-1)(\frac{p+1}{2})}\cdots|I - Y\_{k-1}|^{\frac{p+1}{2}}\mathrm{d}Y\_{1}\wedge\ldots\wedge\mathrm{d}Y\_{k}.\tag{5.8.21}$$

It can be seen that the *Yj* 's are independently distributed as real matrix-variate type-1 beta random variables and the product of the integrals gives the following final result:

$$C\_k = \frac{\Gamma\_p(\alpha\_1 + \dots + \alpha\_{k+1})}{\Gamma\_p(\alpha\_1) \dots \Gamma\_p(\alpha\_{k+1})}, \ \Re(\alpha\_j) > \frac{p-1}{2}, \ j = 1, \dots, k+1. \tag{5.8.22}$$

By integrating out the variables one at a time, we can show that the marginal densities of all subsets of {*X*1*,...,Xk*} also belong to the same real matrix-variate type-1 Dirichlet distribution and single matrices are real matrix-variate type-1 beta distributed. By taking the product moment of the determinants, *E*[|*X*1| *<sup>h</sup>*<sup>1</sup> ···|*Xk*<sup>|</sup> *hk* ]*,* one can anticipate the results; however, arbitrary moments of determinants need not uniquely determine the densities of the corresponding matrices. In the real scalar case, one can uniquely identify the density from arbitrary moments, very often under very mild conditions. The result *I* − *X*<sup>1</sup> −···− *Xk* has a real matrix-variate type-1 beta distribution can be seen by taking arbitrary moments of the determinant, that is, *E*[|*I* − *X*<sup>1</sup> −···− *Xk*| *<sup>h</sup>*], but evaluating the *h*-moment of a determinant and then identifying it as the *h*-th moment of the determinant from a real matrix-variate type-1 beta density is not valid in this case. If one makes a transformation of the type *Y*<sup>1</sup> = *X*1*,...,Yk*−<sup>1</sup> = *Xk*−<sup>1</sup>*, Yk* = *I* − *X*<sup>1</sup> −··· − *Xk*, it is seen that *Xk* = *I* − *Y*<sup>1</sup> − ··· − *Yk* and that the Jacobian in absolute value is 1. Hence, we end up with a real matrix-variate type-1 Dirichlet density of the same format but whose parameters *αk* and *αk*+<sup>1</sup> are interchanged. Then, integrating out *Y*1*,...,Yk*−1*,* we obtain a real matrix-variate type-1 beta density with the parameters *(αk*+1*, α*<sup>1</sup> +···+*αk)*. Hence the result. When *Yk* has a real matrix-variate type-1 beta distribution, we have that *I* − *Yk* = *X*<sup>1</sup> +···+ *Xk* is also a type-1 beta random variable with the parameters interchanged.

The first author and his coworkers have proposed various types of generalizations to the matrix-variate type-1 and type-2 Dirichlet models in the real and complex cases. One of those extensions which is defined in the real domain, is the following:

$$G\_2(X\_1, \ldots, X\_k) = C\_{1k} |X\_1|^{\alpha\_1 - \frac{p+1}{2}} |I - X\_1|^{\beta\_1} |X\_2|^{\alpha\_2 - \frac{p+1}{2}}$$

$$\times |I - X\_1 - X\_2|^{\beta\_2} \cdots |X\_k|^{\alpha\_k - \frac{p+1}{2}}$$

$$\times |I - X\_1 - \cdots - X\_k|^{\alpha\_{k+1} + \beta\_k - \frac{p+1}{2}},\tag{5.8.23}$$

for *(X*1*,...,Xk)* <sup>∈</sup> *<sup>Ω</sup>*, *(αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,...,k* + 1, and *G*<sup>2</sup> = 0 elsewhere. The normalizing constant *C*1*<sup>k</sup>* can be evaluated by integrating variables one at a time or by using the transformation (5.8.20). Under this transformation, the real matrices *Yj* 's are independently distributed as real matrix-variate type-1 beta variables with the parameters *(αj , γj ), γj* = *αj*+<sup>1</sup> +···+ *αk*+<sup>1</sup> + *βj* +···+ *βk*. The conditions will then be *(αj ) > p*−1 <sup>2</sup> *, j* <sup>=</sup> <sup>1</sup>*,...,k* <sup>+</sup> <sup>1</sup>*,* and *(γj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,...,k*. Hence

$$C\_{1k} = \prod\_{j=1}^{k} \frac{\Gamma\_p(\alpha\_j + \chi\_j)}{\Gamma\_p(\alpha\_j)\Gamma\_p(\chi\_j)}.\tag{5.8.24}$$

#### **5.8.7. The type-2 Dirichlet model in the real matrix-variate case**

The type-2 Dirichlet density in the real matrix-variate case is the following:

$$\begin{aligned} G\_3(X\_1, \ldots, X\_k) &= C\_k |X\_1|^{a\_1 - \frac{p+1}{2}} \cdots |X\_k|^{a\_k - \frac{p+1}{2}} \\ &\times |I + X\_1 + \cdots + X\_k|^{-(a\_1 + \cdots + a\_{k+1})}, \; X\_j > O, \; j = 1, \ldots, k, \; (5.8.25) \end{aligned}$$

for *(αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,...,k* + 1 and *G*<sup>3</sup> = 0 elsewhere, the normalizing constant *Ck* being the same as that appearing in the type-1 Dirichlet case. This can be verified, either by integrating matrices one at a time from (5.8.25) or by making the following transformation:

$$\begin{aligned} X\_1 &= Y\_1 \\ X\_2 &= (I + Y\_1)^{\frac{1}{2}} Y\_2 (I + Y\_1)^{\frac{1}{2}} \\ X\_j &= (I + Y\_1)^{\frac{1}{2}} \cdots (I + Y\_{j-1})^{\frac{1}{2}} Y\_j (I + Y\_{j-1})^{\frac{1}{2}} \cdots (I + Y\_1)^{\frac{1}{2}}, \; j = 2, \ldots, k. \end{aligned} \tag{5.8.26}$$

Under this transformation, the Jacobian is as follows:

$$\mathrm{d}X\_1 \wedge \ldots \wedge \mathrm{d}X\_k = |I + Y\_1|^{(k-1)(\frac{p+1}{2})} \cdots |I + Y\_{k-1}|^{\frac{p+1}{2}} \mathrm{d}Y\_1 \wedge \ldots \wedge \mathrm{d}Y\_k. \tag{5.8.27}$$

Thus, the *Yj* 's are independently distributed real matrix-variate type-2 beta variables and the product of the integrals produces [*Ck*] <sup>−</sup>1. By integrating matrices one at a time, we can see that all subsets of matrices belonging to {*X*1*,...,Xk*} will have densities of the type specified in (5.8.25). Several properties can also be established for the model (5.8.25); some of them are included in the exercises.

**Example 5.8.1.** Evaluate the normalizing constant *c* explicitly if the function *f (X*1*, X*2*)* is a statistical density where the *p* × *p* real matrices *Xj > O, I* − *Xj > O, j* = 1*,* 2*,* and *I* − *X*<sup>1</sup> − *X*<sup>2</sup> *> O* where

$$f(X\_1, X\_2) = c \left| X\_1 \right|^{a\_1 - \frac{p+1}{2}} \left| I - X\_1 \right|^{\beta\_1} \left| X\_2 \right|^{a\_2 - \frac{p+1}{2}} \left| I - X\_1 - X\_2 \right|^{\beta\_2 - \frac{p+1}{2}}.$$

**Solution 5.8.1.** Note that

$$|I - X\_1 - X\_2|^{
\beta\_2 - \frac{p+1}{2}} = |I - X\_1|^{
\beta\_2 - \frac{p+1}{2}} |I - (I - X\_1)^{-\frac{1}{2}} X\_2 (I - X\_1)^{-\frac{1}{2}}|^{
\beta\_2 - \frac{p+1}{2}}.$$

Now, letting *Y* = *(I* − *X*1*)* −1 <sup>2</sup>*X*2*(I* − *X*1*)* −1 <sup>2</sup> ⇒ d*Y* = |*I* − *X*1| <sup>−</sup>*p*+<sup>1</sup> <sup>2</sup> d*X*2, and the integral over *X*<sup>2</sup> gives the following:

$$|I - X\_1|^{a\varrho + \beta\_2 - \frac{p+1}{2}} \int\_{O < Y < I} |Y|^{a\varrho - \frac{p+1}{2}} |I - Y|^{\beta\_2 - \frac{p+1}{2}} \, \mathrm{d}Y = |I - X\_1|^{a\varrho + \beta\_2 - \frac{p+1}{2}} \frac{\Gamma\_p(a\varrho)\Gamma\_p(\beta\_2)}{\Gamma\_p(a\varrho + \beta\_2)}$$

for *(α*2*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (β*2*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . Then, the integral over *X*<sup>1</sup> can be evaluated as follows:

$$\int\_{O$$

for *(α*1*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (β*<sup>1</sup> <sup>+</sup> *<sup>β</sup>*<sup>2</sup> <sup>+</sup> *<sup>α</sup>*2*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . Collecting the results from the integrals over *X*<sup>2</sup> and *X*<sup>1</sup> and using the fact that the total integral is 1, we have

$$c = \frac{\Gamma\_p(\alpha\_2 + \beta\_2)\Gamma\_p(\alpha\_1 + \alpha\_2 + \beta\_1 + \beta\_2)}{\Gamma\_p(\alpha\_2)\Gamma\_p(\beta\_2)\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2 + \beta\_1 + \beta\_2)}$$

for *(αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*, (β*2*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *,* and *(β*<sup>1</sup> <sup>+</sup> *<sup>β</sup>*<sup>2</sup> <sup>+</sup> *<sup>α</sup>*2*) > <sup>p</sup>*−<sup>1</sup> 2 .

The first author and his coworkers have established several generalizations to the type-2 Dirichlet model in (5.8.25). One such model is the following:

$$G\_{4}(X\_{1},\ldots,X\_{k}) = C\_{2k}|X\_{1}|^{\alpha\_{1}-\frac{p+1}{2}}|I+X\_{1}|^{-\beta\_{1}}|X\_{2}|^{\alpha\_{2}-\frac{p+1}{2}}$$

$$\times|I+X\_{1}+X\_{2}|^{-\beta\_{2}}\cdots|X\_{k}|^{\alpha\_{k}-\frac{p+1}{2}}$$

$$\times|I+X\_{1}+\cdots+X\_{k}|^{-(\alpha\_{1}+\cdots+\alpha\_{k+1}+\beta\_{k})}\tag{5.8.28}$$

for *(αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,...,k* + 1*, Xj > O, j* = 1*,...,k*, as well as other necessary conditions to be stated later, and *G*<sup>4</sup> = 0 elsewhere. The normalizing constant *C*2*<sup>k</sup>* can be evaluated by integrating matrices one at a time or by making the transformation (5.8.26). Under this transformation, the *Yj* 's are independently distributed real matrix-variate type-2 beta variables with the parameters *(αj , δj )*, where

$$
\delta\_j = \alpha\_1 + \dots + \alpha\_{j-1} + \beta\_j + \dots + \beta\_k. \tag{5.8.29}
$$

The normalizing constant is then

$$G\_{2k} = \prod\_{j=1}^{k} \frac{\Gamma\_p(\alpha\_j + \delta\_j)}{\Gamma\_p(\alpha\_j)\Gamma\_p(\delta\_j)}\tag{5.8.30}$$

where the *δj* is given in (5.8.29). The marginal densities of the subsets, if taken in the order *X*1*,* {*X*1*, X*2}, and so on, will belong to the same family of densities as that specified by (5.8.28). Several properties of the model (5.8.28) are available in the literature.

#### **5.8.8. A pseudo Dirichlet model**

We will now discuss the generalization of the model introduced in Sect. 5.8.5. Consider the density

$$\begin{split} G\_{1k}(X\_1, \ldots, X\_k) &= C\_{1k}|I - X\_1|^{a\_1 - \frac{p+1}{2}} \cdots |I - X\_k|^{a\_k - \frac{p+1}{2}} \\ &\quad \times |X\_2|^{a\_1} |X\_3|^{a\_1 + a\_2} \cdots |X\_k|^{a\_1 + \cdots + a\_{k-1}} \\ &\quad \times |I - X\_k^{\frac{1}{2}} \cdots X\_2^{\frac{1}{2}} X\_1 X\_2^{\frac{1}{2}} \cdots X\_k^{\frac{1}{2}}|^{-(a\_1 + \cdots + a\_k - \frac{p+1}{2})} . \end{split} \tag{5.8.31}$$

Then, by following steps parallel to those used in the real scalar variable case, one can show that the normalizing constant is given by

$$C\_{1k} = \frac{\Gamma\_p(\alpha\_1 + \dots + \alpha\_k)}{\Gamma\_p(\alpha\_1) \dots \Gamma\_p(\alpha\_k)} \frac{\Gamma\_p(p+1)}{[\Gamma\_p(\frac{p+1}{2})]^2}. \tag{5.8.32}$$

The binomial expansion of the last factor determinant in (5.8.31) is somewhat complicated as it involves zonal polynomials; this expansion is given in Mathai (1997). Compared to the real scalar case, the only change is the appearance of the constant *Γp(p*+1*)* [*Γp( <sup>p</sup>*+<sup>1</sup> <sup>2</sup> *)*]<sup>2</sup> which is 1 when *p* = 1. Apart from this constant, the rest is the normalizing constant in a real matrix-variate type-1 Dirichlet model in *k* − 1 variables instead of *k* variables.

#### **5.8a. Dirichlet Models in the Complex Domain**

All the matrices appearing in the remainder of this chapter are *p*×*p* Hermitian positive definite, that is, *X*˜ *<sup>j</sup>* = *X*˜ <sup>∗</sup> *<sup>j</sup>* where an asterisk indicates the conjugate transpose. Complex matrix-variate random variables will be denoted with a tilde. For a complex matrix *X*˜ , the determinant will be denoted by det*(X)*˜ and the absolute value of the determinant, by <sup>|</sup>det*(X)*˜ <sup>|</sup>. For example, if det*(X)*˜ <sup>=</sup> *<sup>a</sup>* <sup>+</sup> *ib, a* and *<sup>b</sup>* being real and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*, the absolute value is <sup>|</sup>det*(X)*˜ |=+*(a*<sup>2</sup> <sup>+</sup> *<sup>b</sup>*2*)* 1 <sup>2</sup> . The type-1 Dirichlet model in the complex domain, denoted by *G*˜ 1, is the following:

$$\begin{split} \tilde{G}\_1(X\_1, \dots, X\_k) &= \tilde{C}\_k |\det(\tilde{X}\_1)|^{\alpha\_1 - p} \cdots |\det(\tilde{X}\_k)|^{\alpha\_k - p} \\ &\times |\det(I - \tilde{X}\_1 - \cdots - \tilde{X}\_k)|^{\alpha\_{k+1} - p} \end{split} \tag{5.8a.1}$$

for *(X*˜ <sup>1</sup>*,..., X*˜ *k)* ∈ *Ω,* ˜ *Ω*˜ = {*(X*˜ <sup>1</sup>*,..., X*˜ *k)*| *O < X*˜ *<sup>j</sup> < I, j* = 1*,... , k, O < X*˜ <sup>1</sup> + ···+*X*˜ *<sup>k</sup> < I* }*, (αj )>p*−1*, j* = 1*,...,k*+1*,* and *G*˜ <sup>1</sup> = 0 elsewhere. The normalizing constant *C*˜ *<sup>k</sup>* can be evaluated by integrating out matrices one at a time with the help of complex matrix-variate type-1 beta integrals. One can also employ a transformation of the type given in (5.8.20) where the real matrices are replaced by matrices in the complex domain and Hermitian positive definite square roots are used. The Jacobian is then as follows:

$$\mathrm{d}\tilde{X}\_{1}\wedge\ldots\wedge\mathrm{d}\tilde{X}\_{k} = |\mathrm{det}(I-\tilde{Y}\_{1})|^{(k-1)p}\cdots|\mathrm{det}(I-Y\_{k-1})|^{p}\mathrm{d}\tilde{Y}\_{1}\wedge\ldots\wedge\mathrm{d}\tilde{Y}\_{k}.\tag{5.8a.2}$$

Then *Y*˜ *<sup>j</sup>* 's are independently distributed as complex matrix-variate type-1 beta variables. On taking the product of the total integrals, one can verify that

$$\tilde{C}\_{k} = \frac{\bar{\Gamma}\_{p}(\alpha\_{1} + \cdots + \alpha\_{k+1})}{\tilde{\Gamma}\_{p}(\alpha\_{1}) \cdots \tilde{\Gamma}\_{p}(\alpha\_{k+1})} \tag{5.8a.3}$$

where for example *Γ*˜ *p(α)* is the complex matrix-variate gamma given by

$$\tilde{\Gamma}\_p(\alpha) = \pi^{\frac{p(p-1)}{2}} \Gamma(\alpha) \Gamma(\alpha - 1) \cdots \Gamma(\alpha - p + 1), \; \Re(\alpha) > p - 1. \tag{5.8a.4}$$

The first author and his coworkers have also discussed various types of generalizations to Dirichlet models in complex domain.

#### **5.8a.1. A type-2 Dirichlet model in the complex domain**

One can have a model parallel to the type-2 Dirichlet model in the real matrix-variate case. Consider the model

$$\begin{split} \tilde{G}\_2 &= \tilde{C}\_k |\det(\tilde{X}\_1)|^{a\_1 - p} \cdots |\det(\tilde{X}\_k)|^{a\_k - p} \\ &\times |\det(I + \tilde{X}\_1 + \cdots + \tilde{X}\_k)|^{-(a\_1 + \cdots + a\_{k+1})} \end{split} \tag{5.8a.5}$$

for *X*˜ *<sup>j</sup> > O, j* = 1*,... , k, (αj )>p*−1*, j* = 1*,...,k*+1*,* and *G*˜ <sup>2</sup> = 0 elsewhere. By integrating out matrices one at a time with the help of complex matrix-variate type-2 integrals or by using a transformation parallel to that provided in (5.7.26) and then integrating out the independently distributed complex type-2 beta variables *Y*˜ *<sup>j</sup>* 's, we can show that the normalizing constant *C*˜ *<sup>k</sup>* is the same as that obtained in the complex type-1 Dirichlet case. The first author and his coworkers have given various types of generalizations to the complex type-2 Dirichlet density as well.

#### **Exercises 5.8**

**5.8.1.** By integrating out variables one at a time derive the normalizing constant in the real scalar type-2 Dirichlet case.

**5.8.2.** By using the transformation (5.8.3), derive the normalizing constant in the real scalar type-1 Dirichlet case.

**5.8.3.** By using the transformation in (5.8.6), derive the normalizing constant in the real scalar type-2 Dirichlet case.

**5.8.4.** Derive the normalizing constants for the extended Dirichlet models in (5.8.9) and (5.8.12).

**5.8.5.** Evaluate *E*[*x h*1 <sup>1</sup> ··· *x hk <sup>k</sup>* ] for the model specified in (5.8.12) and state the conditions for its existence.

**5.8.6.** Derive the normalizing constant given in (5.8.18).

**5.8.7.** With respect to the pseudo Dirichlet model in (5.8.17), show that the product *u* = *x*<sup>1</sup> ··· *xk* is uniformly distributed.

**5.8.8.** Derive the marginal distribution of (1): *x*1; (2): *(x*1*, x*2*)*; (3): *(x*1*,...,xr)*, *r<k*, and the conditional distribution of *(x*1*,...,xr)* given *(xr*+<sup>1</sup>*,...,xk)* in the pseudo Dirichlet model in (5.8.17).

**5.8.9.** Derive the normalizing constant in (5.8.22) by completing the steps in (5.8.22) and then by integrating out matrices one by one.

**5.8.10.** From the outline given after equation (5.8.22), derive the density of *I* −*X*1−···− *Xk* and therefrom the density of *X*<sup>1</sup> +···+ *Xk* when *(X*1*,...,Xk)* has a type-1 Dirichlet distribution.

**5.8.11.** Complete the derivation of *C*1*<sup>k</sup>* in (5.8.24) and verify it by integrating out matrices one at a time from the density given in (5.8.23).

**5.8.12.** Show that *<sup>U</sup>* <sup>=</sup> *(I* <sup>+</sup> *<sup>X</sup>*<sup>1</sup> +···+ *Xk)*−<sup>1</sup> in the type-2 Dirichlet model in (5.8.25) is a real matrix-variate type-1 beta distributed. As well, specify its parameters.

**5.8.13.** Evaluate the normalizing constant *Ck* in (5.8.25) by using the transformation provided in (5.8.26) as well as by integrating out matrices one at a time.

**5.8.14.** Derive the *δj* in (5.8.29) and thus the normalizing constant *C*2*<sup>k</sup>* in (5.8.28).

**5.8.15.** For the following model in the complex domain, evaluate *C*:

$$\begin{split} f(\tilde{X}) &= C |\det(\tilde{X}\_1)|^{a\_1 - p} |\det(I - \tilde{X}\_1)|^{\beta\_1} |\det(\tilde{X}\_2)|^{a\_2 - p} |\det(I - \tilde{X}\_1 - \tilde{X}\_2)|^{\beta\_2} \cdots |\\ &\quad \times |\det(I - \tilde{X}\_1 - \cdots - \tilde{X}\_k)|^{a\_{k+1} - p + \beta\_k} .\end{split}$$

**5.8.16.** Evaluate the normalizing constant in the pseudo Dirichlet model in (5.8.31).

**5.8.17.** In the pseudo Dirichlet model specified in (5.8.31), show that *U* = *X* 1 2 *<sup>k</sup>* ··· *X* 1 2 2 *X*1*X* 1 2 <sup>2</sup> ··· *X* 1 2 *<sup>k</sup>* is uniformly distributed.

**5.8.18.** Show that the normalizing constant in the complex type-2 Dirichlet model specified in (5.8*a*.5) is the same as the one in the type-1 Dirichlet case. Establish the result by integrating out matrices one by one.

**5.8.19.** Show that the normalizing constant in the type-2 Dirichlet case in (5.8*a*.5) is the same as that in the type-1 case. Establish this by using a transformation parallel to (5.8.26) in the complex domain.

**5.8.20.** Construct a generalized model for the type-2 Dirichlet case for *k* = 3 parallel to the case in (5.8.28) in the complex domain.

#### **References**

Mathai, A.M. (1993): *A Handbook of Generalized Special Functions for Statistical and Physical Sciences*, Oxford University Press, Oxford.

Mathai, A.M. (1997): *Jacobians of Matrix Transformations and Functions of Matrix Argument*, World Scientific Publishing, New York.

Mathai, A.M. (1999): *Introduction to Geometrical Probabilities: Distributional Aspects and Applications*, Gordon and Breach, Amsterdam.

Mathai, A.M. (2005): A pathway to matrix-variate gamma and normal densities, *Linear Algebra and its Applications*, **396**, 317–328.

Mathai, A.M. (2009): Fractional integrals in the matrix-variate case and connection to statistical distributions, *Integral Transforms and Special Functions*, **20(12)**, 871–882.

Mathai, A.M. (2010): Some properties of Mittag-Leffler functions and matrix-variate analogues: A statistical perspective, *Fractional Calculus & Applied Analysis*, **13(2)**, 113–132.

Mathai, A.M. (2012): Generalized Kratzel integral and associated statistical densities, ¨ *International Journal of Mathematical Analysis*, **6(51)**, 2501–2510.

Mathai, A.M. (2013): Fractional integral operators in the complex matrix-variate case, *Linear Algebra and its Applications*, **439**, 2901–2913.

Mathai, A.M. (2014): Evaluation of matrix-variate gamma and beta integrals, *Applied Mathematics and computations*, **247**, 312–318.

Mathai, A.M. (2014a): Fractional integral operators involving many matrix variables, *Linear Algebra and its Applications*, **446**, 196–215.

Mathai, A.M. (2014b): Explicit evaluations of gamma and beta integrals in the matrixvariate case, *Journal of the Indian Mathematical Society*, **81(3)**, 259–271.

Mathai, A.M. (2015): Fractional differential operators in the complex matrix-variate case, *Linear Algebra and its Applications*, **478**, 200–217.

Mathai, A.M. and H.J. Haubold, H.J. (1988) *Modern Problems in Nuclear and Neutrino Astrophysics*, Akademie-Verlag, Berlin.

Mathai, A.M. and Haubold, H.J. (2008): *Special Functions for Applied Scientists*, Springer, New York.

Mathai, A.M. and Haubold, H.J. (2011): A pathway from Bayesian statistical analysis to superstatistics, *Applied Mathematics and Computations*, **218**, 799–804.

Mathai, A.M. and Haubold, H.J. (2011a): Matrix-variate statistical distributions and fractional calculus, *Fractional Calculus & Applied Analysis*, **24(1)**, 138–155.

Mathai, A.M. and Haubold, H.J. (2017): *Introduction to Fractional Calculus*, Nova Science Publishers, New York.

Mathai, A.M. and Haubold, H.J. (2017a): *Fractional and Multivariable Calculus: Model Building and Optimization*, Springer, New York.

Mathai, A.M. and Haubold, H.J. (2017b): *Probability and Statistics*, De Gruyter, Germany.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 6 Hypothesis Testing and Null Distributions**

#### **6.1. Introduction**

It is assumed that the readers are familiar with the concept of testing statistical hypotheses on the parameters of a real scalar normal density or independent real scalar normal densities. Those who are not or require a refresher may consult the textbook: Mathai and Haubold (2017) on basic "Probability and Statistics" [De Gruyter, Germany, 2017, free download]. Initially, we will only employ the likelihood ratio criterion for testing hypotheses on the parameters of one or more real multivariate Gaussian (or normal) distributions. All of our tests will be based on a simple random sample of size *n* from a *p*-variate nonsingular Gaussian distribution, that is, the *p* × 1 vectors *X*1*,...,Xn* constituting the sample are iid (independently and identically distributed) as *Xj* ∼ *Np(μ, Σ), Σ > O, j* = 1*,...,n*, when a single real Gaussian population is involved. The corresponding test criterion for the complex Gaussian case will also be mentioned in each section.

In this chapter, we will utilize the following notations. Lower-case letters such as *x, y* will be used to denote real scalar mathematical or random variables. No distinction will be made between mathematical and random variables. Capital letters such as *X, Y* will denote real vector/matrix-variate variables, whether mathematical or random. A tilde placed on a letter as for instance *x,*˜ *y,*˜ *X*˜ and *Y*˜ will indicate that the variables are in the complex domain. No tilde will be used for constant matrices unless the point is to be stressed that the matrix concerned is in the complex domain. The other notations will be identical to those utilized in the previous chapters.

First, we consider certain problems related to testing hypotheses on the parameters of a *p*-variate real Gaussian population. Only the likelihood ratio criterion, also referred to as *λ*-criterion, will be utilized. Let *L* denote the joint density of the sample values in a simple random sample of size *n*, namely, *X*1*,...,Xn,* which are iid *Np(μ, Σ), Σ > O*. Then, as was previously established,

$$L = \prod\_{j=1}^{n} \frac{\mathbf{e}^{-\frac{1}{2}(X\_j - \mu)' \Sigma^{-1} (X\_j - \mu)}}{(2\pi)^{\frac{p}{2}} |\Sigma|^{\frac{1}{2}}} = \frac{\mathbf{e}^{-\frac{1}{2} \text{tr}(\Sigma^{-1} S) - \frac{n}{2} (\bar{X} - \mu)' \Sigma^{-1} (\bar{X} - \mu)}}{(2\pi)^{\frac{np}{2}} |\Sigma|^{\frac{n}{2}}},\tag{6.1.1}$$

where *<sup>S</sup>* <sup>=</sup> *<sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(Xj* <sup>−</sup> *X)(X* ¯ *<sup>j</sup>* <sup>−</sup> *X)*¯ is the sample sum of products matrix and *X*¯ = 1 *<sup>n</sup>(X*<sup>1</sup> +···+*Xn)* is the sample average, *n* being the sample size. As well, we have already determined that the maximum likelihood estimators (MLE's) of *μ* and *Σ* are *μ*ˆ = *X*¯ and *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S,* the sample covariance matrix. Consider the parameter space

$$\Omega = \{ (\mu, \,\Sigma) \vert \,\Sigma > O, \,\,\mu' = (\mu\_1, \dots, \mu\_p), \,\, -\infty < \mu\_j < \infty, \,\, j = 1, \dots, p \}.$$

The maximum value of *L* within is obtained by substituting the MLE's of the parameters into *<sup>L</sup>*, and since *(X*¯ − ˆ*μ)* <sup>=</sup> *(X*¯ <sup>−</sup> *X)*¯ <sup>=</sup> *<sup>O</sup>* and tr*(Σ*<sup>ˆ</sup> <sup>−</sup>1*S)* <sup>=</sup> tr*(nIp)* <sup>=</sup> *np,*

$$\max\_{\Omega} L = \frac{\mathbf{e}^{-\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} |\frac{1}{n} \mathbf{S}|^{\frac{n}{2}}} = \frac{\mathbf{e}^{-\frac{np}{2}} n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} |\mathbf{S}|^{\frac{n}{2}}} \,. \tag{6.1.2}$$

Under any given hypothesis on *μ* or *Σ*, the parameter space is reduced to a subspace *ω* in or *ω* ⊂ . For example, if *Ho* : *μ* = *μo* where *μo* is a given vector, then the parameter space under this null hypothesis reduces to *ω* = {*(μ, Σ)*|*μ* = *μo,Σ>O*} ⊂ , "null hypothesis" being a technical term used to refer to the hypothesis being tested. The alternative hypothesis against which the null hypothesis is tested, is usually denoted by *H*1. If *μ* = *μo* specifies *Ho*, then a natural alternative is *H*<sup>1</sup> : *μ* = *μo*. One of two things can happen when considering the maximum of the likelihood function under *Ho*. The overall maximum may occur in *ω* or it may be attained outside of *ω* but inside . If the null hypothesis *Ho* is actually true, then *ω* and will coincide and the maxima in *ω* and in will agree. If there are several local maxima, then the overall maximum or supremum is taken. The *λ*-criterion is defined as follows:

$$
\lambda = \frac{\sup\_{\alpha} L}{\sup\_{\Omega} L}, \quad 0 < \lambda \le 1. \tag{6.1.3}
$$

If the null hypothesis is true, then *λ* = 1. Accordingly, an observed value of *λ* that is close to 0 in a testing situation indicates that the null hypothesis *Ho* is incorrect and should then be rejected. Hence, the test criterion under the likelihood ratio test is to "reject *Ho* for 0 *< λ* ≤ *λo*", that is, for small values of *λ*, so that, under *Ho*, the coverage probability over this interval is equal to the significance level *α* or the probability of rejecting *Ho* when *Ho* is true, that is, *P r*{0 *< λ* ≤ *λo* | *Ho*} = *α* for a pre-assigned *α*, which is also known as the size of the critical region or the size of the type-1 error. However, rejecting *Ho* when it is not actually true or when the alternative *H*<sup>1</sup> is true is a correct decision

whose probability is known as the power of the test and written as 1 − *β* where *β* is the probability of committing a type-2 error or the error of not rejecting *Ho* when *Ho* is not true. Thus we have

$$\Pr\{0 < \lambda \le \lambda\_o \mid H\_o\} = \alpha \text{ and } \Pr\{0 < \lambda \le \lambda\_o \mid H\_{\mathbb{I}}\} = 1 - \beta. \tag{6.1.4}$$

When we preassign *α* = 0*.*05, we are allowing a tolerance of 5% for the probability of committing the error of rejecting *Ho* when it is actually true and we say that we have a test at the 5% significance level. Usually, we set *α* as 0*.*05 or 0*.*01. Alternatively, we can allow *α* to vary and calculate what is known as the *p*-value when carrying out a test. Such is the principle underlying the likelihood ratio test, the resulting test criterion being referred to as the *λ*-criterion.

In the complex case, a tilde will be placed above *λ* and *L*, (6.1.3) and (6.1.4) remaining essentially the same:

$$
\tilde{\lambda} = \frac{\sup\_{\omega} L}{\sup\_{\Omega} \tilde{L}}, \ 0 < |\tilde{\lambda}| \le 1,\tag{6.1a.1}
$$

and

$$\Pr\{0 < |\tilde{\lambda}| \le \lambda\_o \mid H\_o\} = \alpha,\ Pr\{0 < |\tilde{\lambda}| \le \lambda\_o \mid H\_\mathcal{l}\} = 1 - \beta\tag{6.1a.2}$$

where *α* is the size or significance level of the test and 1 − *β*, the power of the test.

# **6.2. Testing** *Ho* : *μ* = *μ*<sup>0</sup> **(Given) When** *Σ* **is Known, the Real** *Np(μ, Σ)* **Case**

When *Σ* is known, the only parameter to estimate is *μ*, its MLE being *X*¯ . Hence, the maximum in is the following:

$$\sup\_{\Omega} L = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}\mathcal{S})}}{(2\pi)^{\frac{np}{2}}|\Sigma|^{\frac{n}{2}}}.\tag{6.2.1}$$

In this case, *μ* is also specified under the null hypothesis *Ho*, so that there is no parameter to estimate. Accordingly,

$$\begin{split} \sup\_{\boldsymbol{\alpha}} \boldsymbol{L} &= \frac{\mathbf{e}^{-\frac{1}{2}} \sum\_{j=1}^{n} (\boldsymbol{X}\_{j} - \boldsymbol{\mu}\_{o})^{\prime} \boldsymbol{\Sigma}^{-1} (\boldsymbol{X}\_{j} - \boldsymbol{\mu}\_{o})}{(2\pi)^{\frac{np}{2}} |\boldsymbol{\Sigma}|^{\frac{n}{2}}} \\ &= \frac{\mathbf{e}^{-\frac{1}{2} \text{tr}(\boldsymbol{\Sigma}^{-1} \boldsymbol{S}) - \frac{n}{2} (\bar{\boldsymbol{X}} - \boldsymbol{\mu}\_{o})^{\prime} \boldsymbol{\Sigma}^{-1} (\bar{\boldsymbol{X}} - \boldsymbol{\mu}\_{o})}{(2\pi)^{\frac{np}{2}} |\boldsymbol{\Sigma}|^{\frac{n}{2}}} . \end{split} \tag{6.2.2}$$

Thus,

$$\lambda = \frac{\sup\_{\boldsymbol{\alpha}} L}{\sup\_{\Omega} L} = \mathbf{e}^{-\frac{\boldsymbol{n}}{2}(\bar{X} - \mu\_o)' \Sigma^{-1}(\bar{X} - \mu\_o)},\tag{6.2.3}$$

and small values of *λ* correspond to large values of *<sup>n</sup>* <sup>2</sup> *(X*¯ − *μo)*- *<sup>Σ</sup>*−1*(X*¯ <sup>−</sup> *μo)*. When *Xj* <sup>∼</sup> *Np(μ, Σ), Σ > O,* it has already been established that *<sup>X</sup>*¯ <sup>∼</sup> *Np(μ,* <sup>1</sup> *<sup>n</sup>Σ), Σ > O*. As well, *n(X*¯ −*μo)*- *<sup>Σ</sup>*−1*(X*¯ <sup>−</sup>*μo)* is the exponent in a *<sup>p</sup>*-variate real normal density under *Ho*, which has already been shown to have a real chisquare distribution with *p* degrees of freedom or

$$m(\bar{X} - \mu\_o)' \Sigma^{-1} (\bar{X} - \mu\_o) \sim \chi\_p^2.$$

Hence, the test criterion is

$$\text{Reject } H\_o \text{ if } n(\bar{X} - \mu\_o)' \Sigma^{-1} (\bar{X} - \mu\_o) \ge \chi^2\_{p,a}, \text{ with } Pr\{\chi^2\_p \ge \chi^2\_{p,a}\} = \alpha. \quad (6.2.4)$$

Under the alternative hypothesis, the distribution of the test statistic is a noncentral chisquare with *<sup>p</sup>* degrees of freedom and non-centrality parameter *<sup>λ</sup>* <sup>=</sup> *<sup>n</sup>* <sup>2</sup> *(μ* − *μo)*- *<sup>Σ</sup>*−1*(μ* <sup>−</sup> *μo)*.

**Example 6.2.1.** For example, suppose that we have a sample of size 5 from a population that has a trivariate normal distribution and let the significance level *α* be 0.05. Let *μo*, the hypothesized mean value vector specified by the null hypothesis, the known covariance matrix *Σ*, and the five observation vectors *X*1*,...,X*<sup>5</sup> be the following:

$$\begin{aligned} \mu\_o &= \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \; \Sigma = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 1 & 2 \end{bmatrix} \Rightarrow \Sigma^{-1} &= \begin{bmatrix} \frac{1}{2} & 0 & 0 \\ 0 & 2 & -1 \\ 0 & -1 & 1 \end{bmatrix}, \\\ X\_1 &= \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \; X\_2 = \begin{bmatrix} 2 \\ -1 \\ 4 \end{bmatrix}, \; X\_3 = \begin{bmatrix} 0 \\ -1 \\ -2 \end{bmatrix}, \; X\_4 = \begin{bmatrix} 2 \\ 4 \\ 1 \end{bmatrix}, \; X\_5 = \begin{bmatrix} 4 \\ 2 \\ -1 \end{bmatrix}, \; \end{aligned}$$

the inverse of *Σ* having been evaluated via elementary transformations. The sample average, <sup>1</sup> <sup>5</sup> *(X*<sup>1</sup> +···+ *X*5*)* denoted by *X*¯ , is

$$\bar{X} = \frac{1}{5} \left\{ \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} + \begin{bmatrix} 2 \\ -1 \\ 4 \end{bmatrix} + \begin{bmatrix} 0 \\ -1 \\ -2 \end{bmatrix} + \begin{bmatrix} 2 \\ 4 \\ 1 \end{bmatrix} + \begin{bmatrix} 4 \\ 2 \\ -1 \end{bmatrix} \right\} = \frac{1}{5} \begin{bmatrix} 9 \\ 4 \\ 3 \end{bmatrix},$$

and

$$
\bar{X} - \mu\_o = \frac{1}{5} \begin{bmatrix} 9 \\ 4 \\ 3 \end{bmatrix} - \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix} = \frac{1}{5} \begin{bmatrix} 4 \\ 4 \\ 8 \end{bmatrix} \cdot \bar{x}
$$

For testing *Ho*, the following test statistic has to be evaluated:

$$n(\bar{X} - \mu\_o)' \Sigma^{-1} (\bar{X} - \mu\_o) = \frac{5}{5^2} \begin{bmatrix} 4 & 4 & 8 \end{bmatrix} \begin{bmatrix} \frac{1}{2} & 0 & 0\\ 0 & 2 & -1\\ 0 & -1 & 1 \end{bmatrix} \begin{bmatrix} 4\\ 4\\ 8 \end{bmatrix} = \frac{40}{5} = 8.1$$

As per our criterion, *Ho* should be rejected if 8 <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *p,α*. Since *χ*<sup>2</sup> *p,α* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> <sup>3</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 7*.*81*,* this critical value being available from a chisquare table, *Ho* : *μ* = *μo* should be rejected at the specified significance level. Moreover, in this case, the *<sup>p</sup>*-value is *P r*{*χ*<sup>2</sup> <sup>3</sup> ≥ 8} ≈ 0*.*035*,* which can be evaluated by interpolation from the percentiles provided in a chi-square table or by making use of statistical packages such as R.

#### **6.2.1. Paired variables and linear functions**

Let *Y*1*,...,Yk* be *p* × 1 vectors having their own *p*-variate distributions which are not known. However, suppose that a certain linear function *X* = *a*1*Y*<sup>1</sup> +···+ *akYk* is known to have a *p*-variate real Gaussian distribution with mean value vector *E*[*X*] = *μ* and covariance matrix Cov*(X)* = *Σ, Σ > O*, that is, *X* = *a*1*Y*<sup>1</sup> +··· + *akYk* ∼ *Np(μ, Σ), Σ > O*, where *a*1*,...,ak* are fixed known scalar constants. An example of this type is *X* = *Y*<sup>1</sup> − *Y*<sup>2</sup> where *Y*<sup>1</sup> consists of measurements on *p* attributes before subjecting those attributes to a certain process, such as administering a drug to a patient, and *Y*<sup>2</sup> consists of the measurements on the same attributes after the process is completed. We would like to examine the difference *Y*<sup>1</sup> − *Y*<sup>2</sup> to study the effect of the process on these characteristics. If it is reasonable to assume that this difference *X* = *Y*<sup>1</sup> − *Y*<sup>2</sup> is *Np(μ, Σ), Σ > O,* then we could test hypotheses on *E*[*X*] = *μ*. When *Σ* is known, the general problem reduces to that discussed in Sect. 6.2. Assuming that we have iid variables on *Y*1*,...,Yk*, we would evaluate the corresponding values of *X*, which produces iid variables on *X,* that is, a simple random sample of size *n* from *X* = *a*1*Y*<sup>1</sup> +···+ *akYk*. Thus, when *Σ* is known, letting *u* = *n(X*¯ − *μo)*- *<sup>Σ</sup>*−1*(X*¯ <sup>−</sup> *μo)* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup>* where *X*¯ denote the sample average, the test would be carried out as follows at significance level *α*:

$$\text{Reject } H\_0: \mu = \mu\_o \text{ (specified) when } u \ge \chi^2\_{p,\ a}, \text{ with } Pr\{\chi^2\_p \ge \chi^2\_{p,\ a}\} = \alpha,\tag{6.2.5}$$

the non-null distribution of the test statistic *u* being a non-central chisquare.

**Example 6.2.2.** Three variables *x*<sup>1</sup> = systolic pressure, *x*<sup>2</sup> = diastolic pressure and *x*<sup>3</sup> = weight are monitored after administering a drug for the reduction of all these *p* = 3 variables. Suppose that a sample of *n* = 5 randomly selected individuals are given the medication for one week. The following five pairs of observations on each of the three variables were obtained before and after the administration of the medication:

$$
\begin{bmatrix} 150, 140 \\ 90, 90 \\ 70, 68 \end{bmatrix}, \begin{bmatrix} 180, 150 \\ 95, 90 \\ 75, 70 \end{bmatrix}, \begin{bmatrix} 160, 160 \\ 85, 80 \\ 70, 65 \end{bmatrix}, \begin{bmatrix} 140, 138 \\ 85, 90 \\ 70, 71 \end{bmatrix}, \begin{bmatrix} 130, 128 \\ 85, 85 \\ 75, 74 \end{bmatrix}.
$$

Let *X* denote the difference, that is, *X* is equal to the reading before the medication was administered minus the reading after the medication could take effect. The observation vectors on *X* are then

$$X\_1 = \begin{bmatrix} 150 - 140 \\ 90 - 90 \\ 70 - 68 \end{bmatrix} = \begin{bmatrix} 10 \\ 0 \\ 2 \end{bmatrix}, \ X\_2 = \begin{bmatrix} 30 \\ 5 \\ 5 \end{bmatrix}, \ X\_3 = \begin{bmatrix} 0 \\ 5 \\ 5 \end{bmatrix}, \ X\_4 = \begin{bmatrix} 2 \\ -5 \\ -1 \end{bmatrix}, \ X\_5 = \begin{bmatrix} 2 \\ 0 \\ 1 \end{bmatrix}.$$

In this case, *X*1*,...,X*<sup>5</sup> are observations on iid variables. We are going to assume that these iid variables are coming from a population whose distribution is *N*3*(μ, Σ), Σ > O,* where *<sup>Σ</sup>* is known. Let the sample average *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> <sup>5</sup> *(X*<sup>1</sup> +···+*X*5*)*, the hypothesized mean value vector specified by the null hypothesis *Ho* : *μ* = *μo*, and the known covariance matrix *Σ* be as follows:

$$
\bar{X} = \frac{1}{5} \begin{bmatrix} 44 \\ 5 \\ 12 \end{bmatrix}, \; \mu\_o = \begin{bmatrix} 8 \\ 0 \\ 2 \end{bmatrix}, \; \Sigma = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 1 & 2 \end{bmatrix} \Rightarrow \Sigma^{-1} = \begin{bmatrix} \frac{1}{2} & 0 & 0 \\ 0 & 2 & -1 \\ 0 & -1 & 1 \end{bmatrix}.
$$

Let us evaluate *X*¯ − *μo* and *n(X*¯ − *μo)*- *<sup>Σ</sup>*−1*(X*¯ <sup>−</sup> *μo)* which are needed for testing the hypothesis *Ho* : *μ* = *μo*:

$$\begin{aligned} \bar{X} - \mu\_o &= \frac{1}{5} \begin{bmatrix} 44 \\ 5 \\ 12 \end{bmatrix} - \begin{bmatrix} 8 \\ 0 \\ 2 \end{bmatrix} = \frac{1}{5} \begin{bmatrix} 4 \\ 5 \\ 2 \end{bmatrix} \\ \ln(\bar{X} - \mu\_o)' \Sigma^{-1} (\bar{X} - \mu\_o) &= \frac{5}{5^2} \begin{bmatrix} 4 & 5 & 2 \end{bmatrix} \begin{bmatrix} \frac{1}{2} & 0 & 0 \\ 0 & 2 & -1 \\ 0 & -1 & 1 \end{bmatrix} \begin{bmatrix} 4 \\ 5 \\ 2 \end{bmatrix} = 8.4.1 \end{aligned}$$

Let us test *Ho* at the significance level *α* = 0*.*05. The critical value which can readily be found in a chisquare table is *χ*<sup>2</sup> *p, α* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> <sup>3</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 7*.*81. As per our criterion, we reject *Ho* if <sup>8</sup>*.*<sup>4</sup> <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *p, α*; since 8*.*<sup>4</sup> *<sup>&</sup>gt;* <sup>7</sup>*.*81, we reject *Ho*. The *<sup>p</sup>*-value in this case is *P r*{*χ*<sup>2</sup> *<sup>p</sup>* ≥ 8*.*4} = *P r*{*χ*<sup>2</sup> <sup>3</sup> ≥ 8*.*4} ≈ 0*.*04.

#### **6.2.2. Independent Gaussian populations**

Let *Yj* ∼ *Np(μ(j ), Σj ), Σj > O, j* = 1*,... , k,* and let these *k* populations be independently distributed. Assume that a simple random sample of size *nj* from *Yj* is available for *j* = 1*,...,k*; then these samples can be represented by the *p*-vectors *Yj q , q* = 1*,...,nj ,* which are iid as *Yj*1*,* for *j* = 1*,...,k*. Consider a given linear function *X* = *a*1*Y*<sup>1</sup> +··· + *akYk* where *X* is *p* × 1 and the *Yj* 's are taken in a given order. Let *U* = *a*1*Y*¯ <sup>1</sup> +··· + *akY*¯ *<sup>k</sup>* where *Y*¯ *<sup>j</sup>* <sup>=</sup> <sup>1</sup> *nj nj <sup>q</sup>*=<sup>1</sup> *Yj q* for *<sup>j</sup>* <sup>=</sup> <sup>1</sup>*,...,k*. Then *E*[*U*] = *a*1*μ(*1*)* +···+*akμ(k)* = *μ* (say), where *a*1*,...,ak* are given real scalar constants. The covariance matrix in *<sup>U</sup>* is Cov*(U )* <sup>=</sup> *<sup>a</sup>*<sup>2</sup> 1 *n*1 *<sup>Σ</sup>*<sup>1</sup> +···+ *<sup>a</sup>*<sup>2</sup> *k nk Σk* <sup>=</sup> <sup>1</sup> *<sup>n</sup>Σ* (say), where *n* is a symbol. Consider the problem of testing hypotheses on *μ* when *Σ* is known or when *aj , Σj , j* = 1*,... , k,* are known. Let *Ho* : *μ* = *μo* (specified), in the sense *μ(j )* is a known vector for *j* = 1*,...,k*, when *Σ* is known. Then, under *Ho,* all the parameters are known and the standardized *U* is observable, the test statistic being

$$\sum\_{j=1}^{k} a\_j n\_j (\bar{Y}\_1 - \mu\_{(j)})' \Sigma\_j^{-1} (\bar{Y}\_j - \mu\_{(j)}) \sim \sum\_{j=1}^{k} a\_j \chi\_p^{2(j)} \tag{6.2.6}$$

where *χ* 2*(j ) <sup>p</sup> , j* = 1*,... , k,* denote independent chisquares random variables, each having *p* degrees of freedom. However, since this is a linear function of independent chisquare variables, even the null distribution is complicated. Thus, only the case of two independent populations will be examined.

Consider the problem of testing the hypothesis *μ*<sup>1</sup> − *μ*<sup>2</sup> = *δ* (a given vector) when there are two independent normal populations sharing a common covariance matrix *Σ* (known). Then *U* is *U* = *Y*¯ <sup>1</sup> − *Y*¯ <sup>2</sup> with *E*[*U*] = *μ*<sup>1</sup> − *μ*<sup>2</sup> = *δ* (given) under *Ho* and Cov*(U )* <sup>=</sup> *(* <sup>1</sup> *<sup>n</sup>*<sup>1</sup> <sup>+</sup> <sup>1</sup> *n*2 *)Σ* <sup>=</sup> *<sup>n</sup>*1+*n*<sup>2</sup> *<sup>n</sup>*1*n*<sup>2</sup> *Σ,* the test statistic, denoted by *v*, being

$$v = \frac{n\_1 n\_2}{n\_1 + n\_2} (U - \delta)' \Sigma^{-1} (U - \delta) = \frac{n\_1 n\_2}{n\_1 + n\_2} (\bar{Y}\_1 - \bar{Y}\_2 - \delta)' \Sigma^{-1} (\bar{Y}\_1 - \bar{Y}\_2 - \delta) \sim \chi^2\_p. \tag{6.2.7}$$

The resulting test criterion is

Reject *Ho* if the observed value of *<sup>v</sup>* <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *p, α* with *P r*{*χ*<sup>2</sup> *<sup>p</sup>* <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *p,α*} = *α.* (6.2.8)

**Example 6.2.3.** Let *Y*<sup>1</sup> ∼ *N*3*(μ(*1*), Σ)* and *Y*<sup>2</sup> ∼ *N*3*(μ(*2*), Σ)* represent independently distributed normal populations having a known common covariance matrix *Σ*. The null hypothesis is *Ho* : *μ(*1*)* − *μ(*2*)* = *δ* where *δ* is specified. Denote the observation vectors on *Y*<sup>1</sup> and *Y*<sup>2</sup> by *Y*1*<sup>j</sup> , j* = 1*,...,n*<sup>1</sup> and *Y*2*<sup>j</sup> , j* = 1*,...,n*2*,* respectively, and let the sample sizes be *n*<sup>1</sup> = 4 and *n*<sup>2</sup> = 5. Let those observation vectors be

$$\begin{aligned} \begin{bmatrix} Y\_{11} = \begin{bmatrix} 2 \\ 1 \\ 5 \end{bmatrix}, \ Y\_{12} = \begin{bmatrix} 5 \\ 5 \\ 3 \end{bmatrix}, \ Y\_{13} = \begin{bmatrix} 7 \\ 8 \\ 7 \end{bmatrix}, \ Y\_{14} = \begin{bmatrix} 8 \\ 10 \\ 12 \end{bmatrix} \text{ and} \\\ Y\_{21} = \begin{bmatrix} 2 \\ 1 \\ 3 \end{bmatrix}, \ Y\_{22} = \begin{bmatrix} 4 \\ 3 \\ 2 \end{bmatrix}, \ Y\_{23} = \begin{bmatrix} 7 \\ 10 \\ 8 \end{bmatrix}, \ Y\_{24} = \begin{bmatrix} 6 \\ 5 \\ 6 \end{bmatrix}, \ Y\_{25} = \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}, \end{aligned}$$

and the common covariance matrix *Σ* be

$$
\Sigma = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 1 & 2 \end{bmatrix} \Rightarrow \Sigma^{-1} = \begin{bmatrix} \frac{1}{2} & 0 & 0 \\ 0 & 2 & -1 \\ 0 & -1 & 1 \end{bmatrix} \cdot \vec{\Gamma}
$$

Let the hypothesized vector under *Ho* : *μ(*1*)* − *μ(*2*)* = *δ* be *δ*- = *(*1*,* 1*,* 2*)*. In order to test this null hypothesis, the following quantities must be evaluated:

$$\begin{aligned} \bar{Y}\_1 &= \frac{1}{n\_1}(Y\_{11} + \dots + Y\_{1n\_1}) = \frac{1}{4}(Y\_{11} + Y\_{12} + Y\_{13} + Y\_{14}),\\ \bar{Y}\_2 &= \frac{1}{n\_2}(Y\_{21} + \dots + Y\_{2n\_2}) = \frac{1}{5}(Y\_{21} + \dots + Y\_{25}),\\ U &= \bar{Y}\_1 - \bar{Y}\_2, \ v = \frac{n\_1 n\_2}{n\_1 + n\_2}(U - \delta)' \Sigma^{-1} (U - \delta). \end{aligned}$$

They are

$$\begin{aligned} \bar{Y}\_1 &= \frac{1}{4} \left\{ \begin{bmatrix} 2 \\ 1 \\ 5 \end{bmatrix} + \begin{bmatrix} 5 \\ 5 \\ 3 \end{bmatrix} + \begin{bmatrix} 7 \\ 8 \\ 7 \end{bmatrix} + \begin{bmatrix} 8 \\ 10 \\ 12 \end{bmatrix} \right\} = \frac{1}{4} \begin{bmatrix} 22 \\ 24 \\ 27 \end{bmatrix}, \\\bar{Y}\_2 &= \frac{1}{5} \left\{ \begin{bmatrix} 2 \\ 1 \\ 3 \end{bmatrix} + \begin{bmatrix} 4 \\ 3 \\ 2 \end{bmatrix} + \begin{bmatrix} 7 \\ 10 \\ 8 \end{bmatrix} + \begin{bmatrix} 6 \\ 5 \\ 6 \end{bmatrix} + \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} \right\} = \frac{1}{5} \begin{bmatrix} 20 \\ 20 \\ 21 \end{bmatrix}, \\\ U = \bar{Y}\_1 - \bar{Y}\_2 &= \frac{1}{4} \begin{bmatrix} 22 \\ 24 \\ 27 \end{bmatrix} - \frac{1}{5} \begin{bmatrix} 20 \\ 20 \\ 21 \end{bmatrix} = \begin{bmatrix} 1.50 \\ 2.00 \\ 2.55 \end{bmatrix}. \end{aligned}$$

Then,

$$\begin{aligned} U - \delta &= \begin{bmatrix} 1.50 \\ 2.00 \\ 2.55 \end{bmatrix} - \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 0.50 \\ 1.00 \\ 0.55 \end{bmatrix}. \\ v &= \frac{n\_1 n\_2}{n\_1 + n\_2} (U - \delta)' \Sigma^{-1} (U - \delta) \\ &= \frac{(4)(5)}{9} \begin{bmatrix} 0.50 & 1.00 & 0.55 \end{bmatrix} \begin{bmatrix} \frac{1}{2} & 0 & 0 \\ 0 & 2 & -1 \\ 0 & -1 & 1 \end{bmatrix} \begin{bmatrix} 0.50 \\ 1.00 \\ 0.55 \end{bmatrix} \\ &= 1.3275 \times \frac{20}{9} = 2.95. \end{aligned}$$

Let us test *Ho* at the significance level *α* = 0*.*05*.* The critical value which is available from a chisquare table is *χ*<sup>2</sup> *p, α* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> <sup>3</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 7*.*81. As per our criterion, we reject *Ho* if <sup>2</sup>*.*<sup>95</sup> <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *p, α*; however, since 2*.*95 *<* 7*.*81, we cannot reject *Ho*. The *p*-value in this case is *P r*{*χ*<sup>2</sup> *<sup>p</sup>* <sup>≥</sup> <sup>2</sup>*.*95} = *P r*{*χ*<sup>2</sup> <sup>3</sup> ≥ 2*.*95} ≈ 0*.*096*,* which can be determined by interpolation.

# **6.2a. Testing** *Ho* : *μ* = *μo* **(given) When** *Σ* **is Known, Complex Gaussian Case**

The derivation of the *λ*-criterion in the complex domain is parallel to that provided for the real case. In the parameter space,

$$\sup\_{\Omega} \tilde{L} = \frac{\mathbf{e}^{-\text{tr}(\Sigma^{-1}\tilde{\mathcal{S}})}}{\pi^{np} |\det(\Sigma)|^n} \tag{6.2a.1}$$

and under *Ho* : *μ* = *μo*, a given vector,

$$\sup\_{\alpha} \tilde{L} = \frac{\mathbf{e}^{-\text{tr}(\Sigma^{-1}\tilde{S}) - n(\tilde{X} - \mu\_o)^\* \Sigma^{-1} (\tilde{X} - \mu\_o)}}{\pi^{np} |\text{det}(\Sigma)|^n}. \tag{6.2a.2}$$

Accordingly,

$$\tilde{\lambda} = \frac{\sup\_{\omega} \tilde{L}}{\sup\_{\Omega} \tilde{L}} = \mathbf{e}^{-n(\bar{\tilde{X}} - \mu\_o)^\* \Sigma^{-1}(\bar{\tilde{X}} - \mu\_o)}. \tag{6.2a.3}$$

Here as well, small values of *<sup>λ</sup>*˜ correspond to large values of *<sup>y</sup>*˜ <sup>≡</sup> *n(* ¯ *<sup>X</sup>*˜ <sup>−</sup>*μo)*∗*Σ*−1*(* ¯ *X*˜ −*μo),* which has a real gamma distribution with the parameters *(α* = *p, β* = 1*)* or a chisquare distribution with *p* degrees of freedom in the complex domain as described earlier so that 2*y*˜ has a real chisquare distribution having 2*p* degrees of freedom. Thus, a real chisquare table can be utilized for testing the null hypothesis *Ho*, the criterion being

$$\text{Reject } H\_o \text{ if } 2n(\bar{\tilde{X}} - \mu\_o)^\* \Sigma^{-1} (\bar{\tilde{X}} - \mu\_o) \ge \chi^2\_{2p,a}, \text{ with } Pr\{\chi^2\_{2p} \ge \chi^2\_{2p,a}\} = a. \tag{6.2a.4}$$

The test criteria as well as the decisions are parallel to those obtained for the real case in the situations of paired values and in the case of independent populations. Accordingly, such test criteria and associated decisions will not be further discussed.

**Example 6.2a.1.** Let *p* = 2 and the 2 × 1 complex vector *X*˜ ∼ *N*˜2*(μ,*˜ *Σ),* ˜ *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O*, with *Σ*˜ assumed to be known. Consider the null hypothesis *Ho* : ˜*μ* = ˜*μo* where *μ*˜ *<sup>o</sup>* is specified. Let the known *<sup>Σ</sup>*˜ and the specified *<sup>μ</sup>*˜ *<sup>o</sup>* be the following where *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*:

$$
\tilde{\mu}\_o = \begin{bmatrix} 1+i \\ 1-2i \end{bmatrix}, \tilde{\Sigma} = \begin{bmatrix} 2 & 1+i \\ 1-i & 3 \end{bmatrix} \Rightarrow \tilde{\Sigma}^{-1} = \frac{1}{4} \begin{bmatrix} 3 & -(1+i) \\ -(1-i) & 2 \end{bmatrix}, \tilde{\Sigma} = \tilde{\Sigma}^\* > O.
$$

Let the general *μ*˜ and general *X*˜ be represented as follows for *p* = 2:

$$
\tilde{\mu} = \begin{bmatrix} \mu\_1 + i\nu\_1 \\ \mu\_2 + i\nu\_2 \end{bmatrix}, \tilde{X} = \begin{bmatrix} x\_1 + i\jmath\_1 \\ x\_2 + i\jmath\_2 \end{bmatrix}.
$$

so that, for the given *Σ*˜ ,

$$\det(\tilde{\Sigma}) = (2)(3) - (1+i)(1-i) = 6 - (1^2 + 1^2) = 4 = \det(\tilde{X}^\*) = |\det(\tilde{X})|.$$

The exponent of the general density for *p* = 2, excluding −1, is the form *(X*˜ − *μ)*˜ <sup>∗</sup>*Σ*˜ <sup>−</sup>1*(X*˜ − ˜*μ)*. Further,

$$[(\tilde{X} - \tilde{\mu})^\* \tilde{\Sigma}^{-1} (\tilde{X} - \tilde{\mu})]^\* = (\tilde{X} - \tilde{\mu})^\* \tilde{\Sigma}^{-1} (\tilde{X} - \tilde{\mu})^\*$$

since both *<sup>Σ</sup>*˜ and *<sup>Σ</sup>*˜ <sup>−</sup><sup>1</sup> are Hermitian. Thus, the exponent, which is 1 <sup>×</sup> 1, is real and negative definite. The explicit form, excluding −1, for *p* = 2 and the given covariance matrix *Σ*˜ , is the following:

$$\begin{split} \mathcal{Q} &= \frac{1}{4} \{ \mathbf{3} [ (\mathbf{x}\_1 - \mu\_1)^2 + (\mathbf{y}\_1 - \nu\_1^2) ] + 2 [ (\mathbf{x}\_2 - \mu\_2)^2 + (\mathbf{y}\_2 - \nu\_2)^2 ] \} \\ &+ 2 [ (\mathbf{x}\_1 - \mu\_1)(\mathbf{x}\_2 - \mu\_2) + (\mathbf{y}\_1 - \nu\_1)(\mathbf{y}\_2 - \nu\_2) ] \}, \end{split}$$

and the general density for *p* = 2 and this *Σ*˜ is of the following form:

$$f(\tilde{X}) = \frac{1}{4\pi^2} \mathbf{e}^{-\mathcal{Q}}$$

where the *Q* is as previously given. Let the following be an observed sample of size *n* = 4 from a *N*˜2*(μ*˜ *o, Σ)*˜ population whose associated covariance matrix *Σ*˜ is as previously specified:

$$
\tilde{X}\_1 = \begin{bmatrix} 1 \\ 1+i \end{bmatrix}, \ \tilde{X}\_2 = \begin{bmatrix} 2-3i \\ i \end{bmatrix}, \ \tilde{X}\_3 = \begin{bmatrix} 2+i \\ 3 \end{bmatrix}, \ \tilde{X}\_4 = \begin{bmatrix} i \\ 2-i \end{bmatrix}.
$$

Then,

$$\begin{aligned} \bar{\tilde{X}} &= \frac{1}{4} \left\{ \begin{bmatrix} 1 \\ 1+i \end{bmatrix} + \begin{bmatrix} 2-3i \\ i \end{bmatrix} + \begin{bmatrix} 2+i \\ 3 \end{bmatrix} + \begin{bmatrix} i \\ 2-i \end{bmatrix} \right\} = \frac{1}{4} \begin{bmatrix} 5-i \\ 6+i \end{bmatrix}, \\\\ \bar{\tilde{X}} - \tilde{\mu}\_o &= \frac{1}{4} \begin{bmatrix} 5-i \\ 6+i \end{bmatrix} - \begin{bmatrix} 1+i \\ 1-2i \end{bmatrix} = \frac{1}{4} \begin{bmatrix} 1-5i \\ 2+9i \end{bmatrix}, \end{aligned}$$

Hypothesis Testing and Null Distributions

$$\begin{aligned} 2n(\bar{X} - \tilde{\mu}\_o)^\* \tilde{\Sigma}^{-1} (\bar{X} - \tilde{\mu}\_o) \\ &= (2)(4) \frac{1}{4^2} \begin{bmatrix} 1+5i & 2-9i \end{bmatrix} \frac{1}{4} \begin{bmatrix} 3 & -(1+i) \\ -(1-i) & 2 \end{bmatrix} \begin{bmatrix} 1-5i \\ 2+9i \end{bmatrix} \\ &= \frac{1}{8} \{3(1+5i)(1-5i) + 2(2-9i)(2+9i) - (1+i)(1+5i)(2+9i) \} \\ &\quad - (1-i)(2-9i)(1-5i) \} \\ &= \frac{1}{8} \{3 \times 26 + 2 \times 85 + 2 \times 62 \} = 46.5 \end{aligned}$$

Let us test the stated null hypothesis at the significance level *<sup>α</sup>* <sup>=</sup> <sup>0</sup>*.*05. Since *<sup>χ</sup>*<sup>2</sup> 2*p, α* = *χ*2 <sup>4</sup>*,* <sup>0</sup>*.*<sup>05</sup> <sup>=</sup> <sup>9</sup>*.*49 and 46*.*<sup>5</sup> *<sup>&</sup>gt;* <sup>9</sup>*.*49, we reject *Ho*. In this case, the *<sup>p</sup>*-value is *P r*{*χ*<sup>2</sup> 2*p* ≥ <sup>46</sup>*.*5} = *P r*{*χ*<sup>2</sup> <sup>4</sup> ≥ 46*.*5} ≈ 0.

#### **6.2.3. Test involving a subvector of a mean value vector when** *Σ* **is known**

Let the *p* × 1 vector *Xj* ∼ *Np(μ, Σ), Σ > O, j* = 1*, . . . , n,* and the *Xj* 's be independently distributed. Let the joint density of *Xj , j* = 1*, . . . , n,* be denoted by *L*. Then, as was previously established,

$$L = \prod\_{j=1}^{n} \frac{\mathbf{e}^{-\frac{1}{2}(X\_j - \mu)'\Sigma^{-1}(X\_j - \mu)}}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}\mathbb{S}) - \frac{n}{2}(\bar{X} - \mu)'\Sigma^{-1}(\bar{X} - \mu)}}{(2\pi)^{\frac{np}{2}}|\Sigma|^{\frac{n}{2}}} \tag{i}$$

where *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn)* and, letting **X** = *(X*1*,...,Xn)* of dimension *p* × *n* and **X**¯ = *(X,... ,* ¯ *X)*¯ , *S* = *(***X** − **X**¯ *)(***X** − **X**¯ *)*- . Let *X*¯ , *Σ*−<sup>1</sup> and *μ* be partitioned as follows:

$$\bar{X} = \begin{bmatrix} \bar{X}^{(1)} \\ \bar{X}^{(2)} \end{bmatrix}, \quad \begin{bmatrix} \Sigma^{11} & \Sigma^{12} \\ \Sigma^{21} & \Sigma^{22} \end{bmatrix}, \; \mu = \begin{bmatrix} \mu^{(1)} \\ \mu^{(2)} \end{bmatrix}$$

where *<sup>X</sup>*¯ *(*1*)* and *μ(*1*)* are *<sup>r</sup>* <sup>×</sup> <sup>1</sup>*, r < p,* and *<sup>Σ</sup>*<sup>11</sup> is *<sup>r</sup>* <sup>×</sup> *<sup>r</sup>*. Consider the hypothesis *<sup>μ</sup>(*1*)* <sup>=</sup> *μ(*1*) <sup>o</sup>* (specified) with *Σ* known. Thus, this hypothesis concerns only a subvector of the mean value vector, the population covariance matrix being assumed known. In the entire parameter space , *μ* is estimated by *X*¯ where *X*¯ is the maximum likelihood estimator (MLE) of *μ*. The maximum of the likelihood function in the entire parameter space is then

$$\max\_{\Omega} L = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}S)}}{(2\pi)^{\frac{np}{2}}|\Sigma|^{\frac{n}{2}}}.\tag{ii}$$

Let us now determine the MLE of *μ(*2*)* , which is the only unknown quantity under the null hypothesis. To this end, we consider the following expansion:

$$\begin{split} (\bar{X} - \mu)' \Sigma^{-1} (\bar{X} - \mu) &= [(\bar{X}^{(1)} - \mu\_o^{(1)})', (\bar{X}^{(2)} - \mu^{(2)})'] \begin{bmatrix} \Sigma^{11} & \Sigma^{12} \\ \Sigma^{21} & \Sigma^{22} \end{bmatrix} \begin{bmatrix} \bar{X}^{(1)} - \mu^{(1)} \\ \bar{X}^{(2)} - \mu^{(2)} \end{bmatrix} \\ &= (\bar{X}^{(1)} - \mu\_o^{(1)})' \Sigma^{11} (\bar{X}^{(1)} - \mu\_o^{(1)}) + (\bar{X}^{(2)} - \mu^{(2)})' \Sigma^{22} (\bar{X}^{(2)} - \mu^{(2)}) \\ &\quad + 2(\bar{X}^{(2)} - \mu^{(2)})' \Sigma^{21} (\bar{X}^{(1)} - \mu\_o^{(1)}). \end{split}$$

Noting that there are only two terms involving *μ(*2*)* in *(iii)*, we have

$$\frac{\partial}{\partial \mu^{(2)}} \ln L = O \Rightarrow O - 2\Sigma^{22} (\bar{X}^{(2)} - \mu^{(2)}) - 2\Sigma^{21} (\bar{X}^{(1)} - \mu\_o^{(1)}) = O,$$

$$\Rightarrow \hat{\mu}^{(2)} = \bar{X}^{(2)} + (\Sigma^{22})^{-1}\Sigma^{21} (\bar{X}^{(1)} - \mu\_o^{(1)}).$$

Then, substituting this MLE *<sup>μ</sup>*<sup>ˆ</sup> *(*2*)* in the various terms in *(iii),* we have the following:

$$\begin{split} (\bar{X}^{(2)} - \hat{\mu}^{(2)})' \Sigma^{22} (\bar{X}^{(2)} - \hat{\mu}^{(2)}) &= (\bar{X}^{(1)} - \mu\_o^{(1)})' \Sigma^{12} (\Sigma^{22})^{-1} \Sigma^{21} (\bar{X}^{(1)} - \mu\_o^{(1)}) \\ 2(\bar{X}^{(2)} - \hat{\mu}^{(2)})' \Sigma^{21} (\bar{X}^{(1)} - \mu\_o^{(1)}) &= -2(\bar{X}^{(1)} - \mu\_o^{(1)})' \Sigma^{12} (\Sigma^{22})^{-1} \Sigma^{21} (\bar{X}^{(1)} - \mu\_o^{(1)}) \Rightarrow \\ (\bar{X} - \mu)' \Sigma^{-1} (\bar{X} - \mu) &= (\bar{X}^{(1)} - \mu\_o^{(1)})' [\Sigma^{11} - \Sigma^{12} (\Sigma^{22})^{-1} \Sigma^{21}] (\bar{X}^{(1)} - \mu\_o^{(1)}) \\ &= (\bar{X}^{(1)} - \mu\_o^{(1)})' \Sigma^{-1}\_{11} (\bar{X}^{(1)} - \mu\_o^{(1)}), \end{split}$$

since, as established in Sect. 1.3, *Σ*−<sup>1</sup> <sup>11</sup> <sup>=</sup> *<sup>Σ</sup>*<sup>11</sup> <sup>−</sup> *<sup>Σ</sup>*12*(Σ*22*)*−1*Σ*21. Thus, the maximum of *L* under the null hypothesis is given by

$$\max\_{H\_o} L = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\boldsymbol{\Sigma}^{-1}\boldsymbol{S}) - \frac{\boldsymbol{n}}{2}(\bar{\boldsymbol{X}}^{(1)} - \boldsymbol{\mu}\_o^{(1)})'\boldsymbol{\Sigma}\_{11}^{-1}(\bar{\boldsymbol{X}}^{(1)} - \boldsymbol{\mu}\_o^{(1)})}{(2\pi)^{\frac{np}{2}}|\boldsymbol{\Sigma}|^{\frac{n}{2}}},$$

and the *λ*-criterion is then

$$\lambda = \frac{\max\_{H\_o} L}{\max\_{\Omega} L} = \mathbf{e}^{-\frac{\pi}{2}(\bar{X}^{(1)} - \mu\_o^{(1)})' \Sigma\_{11}^{-1} (\bar{X}^{(1)} - \mu\_o^{(1)})}.\tag{6.2.1}$$

Hence, we reject *Ho* for small values of *<sup>λ</sup>* or for large values of *n(X*¯ *(*1*)*−*μ(*1*) <sup>o</sup> )*- *Σ*−<sup>1</sup> <sup>11</sup> *(X*¯ *(*1*)*<sup>−</sup> *μ(*1*) <sup>o</sup> )* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>r</sup>* since the expected value and covariance matrix of *<sup>X</sup>*¯ *(*1*)* are respectively *μ(*1*) o* and *Σ*11*/n.* Accordingly, the criterion can be enunciated as follows:

$$\text{Reject } H\_o: \mu^{(\mathcal{l})} = \mu\_o^{(\mathcal{l})} \text{ (given) if } u \equiv n(\bar{X}^{(\mathcal{l})} - \mu\_o^{(\mathcal{l})})' \Sigma\_{11}^{-1} (\bar{X}^{(\mathcal{l})} - \mu\_o^{(\mathcal{l})}) \ge \chi^2\_{\tau, u} \tag{6.2.2}$$

with *P r*{*χ*<sup>2</sup> *<sup>r</sup>* <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *r, α*} = *α*. In the complex Gaussian case, the corresponding 2*u*˜ will be distributed as a real chisquare random variable having 2*r* degrees of freedom; thus, the criterion will consist of rejecting the corresponding null hypothesis whenever the observed value of 2*u*˜ <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> <sup>2</sup>*r, α*.

**Example 6.2.4.** Let the 4 × 1 vector *X* have a real normal distribution *N*4*(μ, Σ), Σ > O*. Consider the hypothesis that part of *μ* is specified. For example, let the hypothesis *Ho* and *Σ* be the following:

$$H\_o: \mu = \mu\_o = \begin{bmatrix} 1 \\ -1 \\ \mu\_3 \\ \mu\_4 \end{bmatrix}, \ \Sigma = \begin{bmatrix} 2 & 1 & 0 & 0 \\ 1 & 2 & 0 & 1 \\ 0 & 0 & 3 & 1 \\ 0 & 1 & 1 & 2 \end{bmatrix} = \Sigma' > O, \ X = \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \\ x\_4 \end{bmatrix} \equiv \begin{bmatrix} X^{(1)} \\ X^{(2)} \end{bmatrix}.$$

Since we are specifying the first two parameters in *μ*, the hypothesis can be tested by computing the distribution of *<sup>X</sup>(*1*)* <sup>=</sup> *x*1 *x*2 . Observe that *<sup>X</sup>(*1*)* <sup>∼</sup> *<sup>N</sup>*2*(μ(*1*) , Σ*11*), Σ*<sup>11</sup> *> O* where

$$
\mu^{(\mathcal{l})} = \begin{bmatrix} \mu\_1 \\ \mu\_2 \end{bmatrix}, \ \mu\_o^{(\mathcal{l})} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}, \ H\_o: \mu^{(\mathcal{l})} = \mu\_o^{(\mathcal{l})}, \ \Sigma\_{\mathcal{l}1} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \Rightarrow \Sigma\_{\mathcal{l}1}^{-1} = \frac{1}{3} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}.
$$

Let the observed vectors from the original *N*4*(μ, Σ)* population be

$$X\_1 = \begin{bmatrix} 1 \\ 0 \\ 2 \\ 4 \end{bmatrix}, \ X\_2 = \begin{bmatrix} -1 \\ 1 \\ 1 \\ 2 \end{bmatrix}, \ X\_3 = \begin{bmatrix} 0 \\ 2 \\ 3 \\ 4 \end{bmatrix}, \ X\_4 = \begin{bmatrix} 2 \\ 1 \\ -1 \\ 3 \end{bmatrix}, \ X\_5 = \begin{bmatrix} 2 \\ -1 \\ 0 \\ 4 \end{bmatrix}.$$

Then the observations corresponding to the subvector *X(*1*)* , denoted by *X(*1*) <sup>j</sup>* , are the following:

$$X\_1^{(1)} = \begin{bmatrix} 1 \\ 0 \end{bmatrix},\ X\_2^{(1)} = \begin{bmatrix} -1 \\ 1 \end{bmatrix},\ X\_3^{(1)} = \begin{bmatrix} 0 \\ 2 \end{bmatrix},\ X\_4^{(1)} = \begin{bmatrix} 2 \\ 1 \end{bmatrix},\ X\_5^{(1)} = \begin{bmatrix} 2 \\ -1 \end{bmatrix}.$$

In this case, the sample size *<sup>n</sup>* <sup>=</sup> 5 and the sample mean, denoted by *<sup>X</sup>*¯ *(*1*)* , is

$$\begin{aligned} \bar{X}^{(1)} &= \frac{1}{5} \left\{ \begin{bmatrix} 1 \\ 0 \end{bmatrix} + \begin{bmatrix} -1 \\ 1 \end{bmatrix} + \begin{bmatrix} 0 \\ 2 \end{bmatrix} + \begin{bmatrix} 2 \\ 1 \end{bmatrix} + \begin{bmatrix} 2 \\ -1 \end{bmatrix} \right\} = \frac{1}{5} \begin{bmatrix} 4 \\ 3 \end{bmatrix} \Rightarrow \\\ \bar{X}^{(1)} - \mu\_o^{(1)} &= \frac{1}{5} \begin{bmatrix} 4 \\ 3 \end{bmatrix} - \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \frac{1}{5} \begin{bmatrix} -1 \\ 8 \end{bmatrix} .\end{aligned}$$

Therefore

$$\begin{split} n(\bar{X}^{(1)} - \mu\_o^{(1)})' \Sigma\_{11}^{-1} (\bar{X}^{(1)} - \mu\_o^{(1)}) &= \frac{5}{5^2} \begin{bmatrix} -1 & 8 \end{bmatrix} \frac{1}{3} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} -1 \\ 8 \end{bmatrix} \\ &= \frac{1}{15} (146) = 9.73. \end{split}$$

If 9*.*73 *> χ*<sup>2</sup> <sup>2</sup>*, α,* then we would reject *<sup>H</sup>(*1*) <sup>o</sup>* : *μ(*1*)* <sup>=</sup> *μ(*1*) <sup>o</sup>* . Let us test this hypothesis at the significance level *<sup>α</sup>* <sup>=</sup> <sup>0</sup>*.*01. Since *<sup>χ</sup>*<sup>2</sup> <sup>2</sup>*,* <sup>0</sup>*.*<sup>01</sup> = 9*.*21*,* we reject the null hypothesis. In this instance, the *<sup>p</sup>*-value, which can be determined from a chisquare table, is *P r*{*χ*<sup>2</sup> 2 ≥ 9*.*73} ≈ 0*.*007.

# **6.2.4. Testing** *μ*<sup>1</sup> =···= *μp***, with** *Σ* **known, real Gaussian case**

Let *Xj* ∼ *Np(μ, Σ), Σ > O, j* = 1*,...,n*, and the *Xj* be independently distributed. Letting *μ*-= *(μ*1*,...,μp),* consider the hypothesis

$$H\_0: \mu\_1 = \mu\_2 = \dots = \mu\_p = \nu,$$

where *ν*, the common *μj* is unknown. This implies that *μi* − *μj* = 0 for all *i* and *j* . Consider the *p* × 1 vector *J* of unities, *J* - = *(*1*,...,* 1*)* and then take any non-null vector that is orthogonal to *J* . Let *A* be such a vector so that *A*- *J* = 0. Actually, *p* − 1 linearly independent such vectors are available. For example, if *p* is even, then take 1*,* −1*,...,* 1*,* −1 as the elements of *A* and, when *p* is odd, one can start with 1*,* −1*,...,* 1*,* −1 and take the last three elements as 1*,* −2*,* 1, or the last element as 0, that is,

$$J = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \\ 1 \end{bmatrix}, \ A = \begin{bmatrix} 1 \\ -1 \\ \vdots \\ -1 \\ 1 \\ -1 \end{bmatrix} \text{ for } p \text{ even and } A = \begin{bmatrix} 1 \\ -1 \\ \vdots \\ 1 \\ -2 \\ 1 \end{bmatrix} \text{ or } \begin{bmatrix} 1 \\ -1 \\ \vdots \\ -1 \\ 0 \end{bmatrix} \text{ for } p \text{ odd.}$$

When the last element of the vector *A* is zero, we are simply ignoring the last element in *Xj* . Let the *p* × 1 vector *Xj* ∼ *Np(μ, Σ), Σ > O, j* = 1*, . . . , n,* and the *Xj* 's be independently distributed. Let the scalar *yj* = *A*- *Xj* and the 1 × *n* vector *Y* = *(y*1*,...,yn)* = *(A*- *X*1*,...,A*- *Xn)* = *A*- *(X*1*,...,Xn)* = *A*- **X**, where the *p*×*n* matrix **X** = *(X*1*,...,Xn)*. Let *<sup>y</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(y*<sup>1</sup> +···+*yn)* = *A*- 1 *<sup>n</sup>(X*<sup>1</sup> +···+*Xn)* = *A*- *X*¯ . Then. *<sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(yj* − ¯*y)(yj* − ¯*y)*- = *A <sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(Xj* <sup>−</sup>*X)(X* ¯ *<sup>j</sup>* <sup>−</sup>*X)*¯ - *A* where *<sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(Xj* <sup>−</sup>*X)(X* ¯ *<sup>j</sup>* <sup>−</sup>*X)*¯ - = *(***X**−**X**¯ *)(***X**−**X**¯ *)*-= *S* =

#### Hypothesis Testing and Null Distributions

the sample sum of products matrix in the *Xj* 's, where **X**¯ = *(X,... ,* ¯ *X)*¯ the *p* × *n* matrix whose columns are all equal to *X*¯ . Thus, one has *<sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(yj* − ¯*y)*<sup>2</sup> <sup>=</sup> *<sup>A</sup>*- *SA*. Consider the hypothesis *μ*<sup>1</sup> = ··· = *μp* = *ν*. Then, *A*- *μ* = *νA*- *J* = *ν* 0 = 0 under *Ho*. Since *Xj* ∼ *Np(μ, Σ), Σ > O,* we have *yj* ∼ *N*1*(A*- *μ, A*- *ΣA), A*- *ΣA >* 0. Under *Ho*, *yj* ∼ *N*1*(*0*, A*- *ΣA), j* = 1*,...,n*, the *yj* 's being independently distributed. Consider the joint density of *y*1*,...,yn*, denoted by *L*:

$$L = \prod\_{j=1}^{n} \frac{\mathbf{e}^{-\frac{1}{2A'\Sigma A}(\mathbf{y}\_j - A'\mu)^2}}{(2\pi)^{\frac{1}{2}[A'\Sigma A]^{\frac{1}{2}}}}.\tag{i}$$

Since *Σ* is known, the only unknown quantity in *L* is *μ*. Differentiating ln*L* with respect to *μ* and equating the result to a null vector, we have

$$\sum\_{j=1}^{n} (\mathbf{y}\_j - A'\hat{\boldsymbol{\mu}}) = 0 \Rightarrow \sum\_{j=1}^{n} \mathbf{y}\_j - nA'\hat{\boldsymbol{\mu}} = 0 \Rightarrow \bar{\mathbf{y}} - A'\hat{\boldsymbol{\mu}} = 0 \Rightarrow A'(\bar{X} - \hat{\boldsymbol{\mu}}) = 0.$$

However, since *A* is a fixed known vector and the equation holds for arbitrary *X*¯ , *μ*ˆ = *X*¯ . Hence the maximum of *L*, in the entire parameter space = *μ*, is the following:

$$\max\_{\Omega} L = \frac{\mathbf{e}^{-\frac{1}{2A'\Sigma A}\sum\_{j=1}^{n}\left[A'(X\_j - \bar{X})\right]^2}}{(2\pi)^{\frac{np}{2}}[A'\Sigma A]^{\frac{n}{2}}} = \frac{\mathbf{e}^{-\frac{1}{2A'\Sigma A}A'\Sigma A}}{(2\pi)^{\frac{np}{2}}[A'\Sigma A]^{\frac{n}{2}}}.\tag{ii}$$

Now, noting that under *Ho*, *A*- *μ* = 0, we have

$$\max\_{H\_o} L = \frac{\mathbf{e}^{-\frac{1}{2A'\Sigma A} \sum\_{j=1}^{n} A'X\_j X\_j' A'}}{(2\pi)^{\frac{np}{2}} [A'\Sigma A]^{\frac{n}{2}}}.\tag{iiii}$$

From *(i)* to *(iii)*, the *λ*-criterion is as follows, observing that *A*- *( <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *XjX*- *<sup>j</sup> )A* = *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>A</sup>*- *(Xj* − *X)(X* ¯ *<sup>j</sup>* − *X)*¯ - *A* + *nA*- *(X*¯ *X*¯ - *A)* = *A*- *SA* + *nA*- *X*¯ *X*¯ - *A*:

$$
\lambda = \mathbf{e}^{-\frac{n}{2A'EA}A'\bar{X}\bar{X}'A}.\tag{6.2.3}
$$

But since *<sup>n</sup> A*- *ΣAA*- *X*¯ ∼ *N*1*(*0*,* 1*)* under *Ho*, we may test this null hypothesis either by using the standard normal variable or a chisquare variable as *<sup>n</sup> A*- *ΣAA*- *X*¯ *X*¯ - *<sup>A</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> under *Ho*. Accordingly, the criterion consists of rejecting *Ho*

$$\text{when } \left| \sqrt{\frac{n}{A'\Sigma A}} A'\bar{X} \right| \ge z\_{\frac{\alpha}{2}}, \text{ with } Pr\{z \ge z\_{\beta}\} = \beta, \ z \sim N\_1(0, 1)$$

or

$$\text{when } u \equiv \frac{n}{A'\Sigma A} (A'\bar{X}\bar{X}'A) \ge \chi^2\_{\text{l},a}, \text{ with } Pr\{\chi^2\_{\text{l}} \ge \chi^2\_{\text{l},a}\} = \alpha, \ u \sim \chi^2\_{\text{l}}.\tag{6.2.4}$$

**Example 6.2.5.** Consider a 4-variate real Gaussian vector *X* ∼ *N*4*(μ, Σ), Σ > O* with *Σ* as specified in Example 6.2.4 and the null hypothesis that the individual components of the mean value vector *μ* are all equal, that is,

$$\Sigma = \begin{bmatrix} 2 & 1 & 0 & 0 \\ 1 & 2 & 0 & 1 \\ 0 & 0 & 3 & 1 \\ 0 & 1 & 1 & 2 \end{bmatrix}, \ H\_o: \mu\_1 = \mu\_2 = \mu\_3 = \mu\_4 \equiv \nu \text{ (say), with } \mu = \begin{bmatrix} \mu\_1 \\ \mu\_2 \\ \mu\_3 \\ \mu\_4 \end{bmatrix}.$$

Let *L* be a 4 × 1 constant vector such that *L*- = *(*1*,* −1*,* 1*,* −1*)*. Then, under *Ho*, *L*- *μ* = 0 and *u* = *L*- *X* is univariate normal; more specifically, *u* ∼ *N*1*(*0*, L*- *ΣL)* where

$$L'\Sigma L = \begin{bmatrix} 1 & -1 & 1 & -1 \end{bmatrix} \begin{bmatrix} 2 & 1 & 0 & 0 \\ 1 & 2 & 0 & 1 \\ 0 & 0 & 3 & 1 \\ 0 & 1 & 1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \\ 1 \\ -1 \end{bmatrix} = 7 \Rightarrow u \sim N\_1(0, 7).$$

Let the observation vectors be the same as those used in Example 6.2.4 and let *uj* = *L*- *Xj , j* = 1*,...,* 5. Then, the five independent observations from *u* ∼ *N*1*(*0*,* 7*)* are the following:

$$\begin{aligned} \boldsymbol{u}\_{1} &= \boldsymbol{L}^{\prime} \boldsymbol{X}\_{1} = \begin{bmatrix} 1 & -1 & 1 & -1 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 2 \\ 4 \end{bmatrix} = -1, \ \boldsymbol{u}\_{2} &= \boldsymbol{L}^{\prime} \begin{bmatrix} -1 \\ 1 \\ 1 \\ 2 \end{bmatrix} = -3, \ \boldsymbol{u}\_{3} &= \boldsymbol{L}^{\prime} \begin{bmatrix} 0 \\ 2 \\ 3 \\ 4 \end{bmatrix} = -3, \ \boldsymbol{u}\_{4} &= \boldsymbol{L}^{\prime} \begin{bmatrix} 0 \\ 2 \\ 3 \\ 4 \end{bmatrix} = -3, \ \boldsymbol{u}\_{5} &= \boldsymbol{L}^{\prime} \begin{bmatrix} 2 \\ 2 \\ 4 \\ 3 \end{bmatrix} = -1, \ \boldsymbol{u}\_{6} \end{aligned}$$

the average *<sup>u</sup>*¯ <sup>=</sup> <sup>1</sup> <sup>5</sup> *(u*<sup>1</sup> +···+ *<sup>u</sup>*5*)* <sup>=</sup> <sup>1</sup> <sup>5</sup> *(*−<sup>1</sup> <sup>−</sup> <sup>3</sup> <sup>−</sup> <sup>3</sup> <sup>−</sup> <sup>3</sup> <sup>−</sup> <sup>1</sup>*)* being equal to <sup>−</sup><sup>11</sup> <sup>5</sup> *.* Then, the standardized sample mean *z* = √*n σu (u*¯ − 0*)* ∼ *N*1*(*0*,* 1*)*. Let us test the null hypothesis at the significance level *α* = 0*.*05. Referring to a *N*1*(*0*,* 1*)* table, the required critical value, denoted by *z <sup>α</sup>* <sup>2</sup> = *z*0*.*<sup>025</sup> is 1*.*96. Therefore, we reject *Ho* in favor of the alternative hypothesis that at least two components of *μ* are unequal at significance level *α* if the observed value of

$$|z| = \left| \frac{\sqrt{n}}{\sigma\_u} (\bar{u} - 0) \right| \ge 1.96.$$

#### Hypothesis Testing and Null Distributions

Since the observed value of |*z*| is | √ √ 5 7 *(*−<sup>7</sup> <sup>5</sup> <sup>−</sup> <sup>0</sup>*)*| = <sup>√</sup>1*.*<sup>4</sup> <sup>=</sup> <sup>1</sup>*.*18 is less than 1*.*96, we do not reject *Ho* at the 5% significance level. Letting *z* ∼ *N*1*(*0*,* 1*)*, the *p*-value in this case is *P r*{|*z*| ≥ 1*.*18} = 0*.*238, this quantile being available from a standard normal table.

In the complex case, proceeding in a parallel manner to the real case, the lambda criterion will be the following:

$$
\tilde{\lambda} = \mathbf{e}^{-\frac{n}{A^\* \Sigma A} A^\* \tilde{X} \tilde{X}^\* A} \tag{6.2a.5}
$$

where an asterisk indicates the conjugate transpose. Letting *<sup>u</sup>*˜ <sup>=</sup> <sup>2</sup>*<sup>n</sup> <sup>A</sup>*∗*ΣA(A*<sup>∗</sup> ¯ *X*˜ ¯ *X*˜ <sup>∗</sup>*A),* it can be shown that under *Ho*, *u*˜ is distributed as a real chisquare random variable having 2 degrees of freedom. Accordingly, the criterion will be as follows:

$$\text{Reject } H\_0 \text{ if the observed } \tilde{\mu} \ge \chi^2\_{2,a} \text{ with } Pr\{\chi^2\_2 \ge \chi^2\_{2,a}\} = a. \tag{6.2a.6}$$

**Example 6.2a.2.** When *p >* 2, the computations become quite involved in the complex case. Thus, we will let *p* = 2 and consider the bivariate complex *N*˜2*(μ,*˜ *Σ)*˜ distribution that was specified in Example 6.2a.1, assuming that *Σ*˜ is as given therein, the same set of observations being utilized as well. In this case, the null hypothesis is *Ho* : ˜*μ*<sup>1</sup> = ˜*μ*2, the parameters and sample average being

$$
\tilde{\mu} = \begin{bmatrix} \tilde{\mu}\_1 \\ \tilde{\mu}\_2 \end{bmatrix}, \ \tilde{\Sigma} = \begin{bmatrix} 2 & 1+i \\ 1-i & 3 \end{bmatrix}, \ \bar{\tilde{X}} = \frac{1}{4} \begin{bmatrix} 5-i \\ 6+i \end{bmatrix}.
$$

Letting *L*- = *(*1*,* −1*)*, *L*- *μ*˜ = 0 under *Ho*, and

$$\begin{split} \tilde{u} = L' \tilde{\bar{X}} &= \frac{1}{4} \begin{bmatrix} 1 & -1 \end{bmatrix} \begin{bmatrix} 5 - i \\ 6 + i \end{bmatrix} = -\frac{1}{4} (1 + 2i); (L' \bar{\tilde{X}})^{\*} (L' \bar{\tilde{X}}) = \frac{1}{16} (1 - 2i)(1 + 2i) = \frac{5}{16}; \\ L' \tilde{\Sigma} L &= \begin{bmatrix} 1 & -1 \end{bmatrix} \begin{bmatrix} 2 & 1 + i \\ 1 - i & 3 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = 3; \ v = \frac{2n}{L' \tilde{\Sigma} L} [(L' \bar{\tilde{X}})^{\*} (L' \bar{\tilde{X}}) = \frac{8}{3} \times \frac{5}{16} = \frac{5}{6}. \end{split}$$

The criterion consists of rejecting *Ho* if the observed value of *<sup>v</sup>* <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> <sup>2</sup>*, α*. Letting the significance level of the test be *<sup>α</sup>* <sup>=</sup> <sup>0</sup>*.*05, the critical value is *<sup>χ</sup>*<sup>2</sup> <sup>2</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 5*.*99, which is readily available from a chisquare table. The observed value of *v* being <sup>5</sup> <sup>6</sup> *<* 5*.*99, we do not reject *Ho*. In this case, the *<sup>p</sup>*-value is *P r*{*χ*<sup>2</sup> <sup>2</sup> <sup>≥</sup> <sup>5</sup> <sup>6</sup> } ≈ 0*.*318.

# **6.2.5. Likelihood ratio criterion for testing** *Ho* : *μ*<sup>1</sup> =···= *μp* **,** *Σ* **known**

Consider again, *Xj* ∼ *Np(μ, Σ), Σ > O*, *j* = 1*, . . . , n,* with the *Xj* 's being independently distributed and *Σ*, assumed known. Letting the joint density of *X*1*,...,Xn* be denoted by *L*, then, as determined earlier,

$$L = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\boldsymbol{\Sigma}^{-1}\mathbf{S}) - \frac{\boldsymbol{n}}{2}(\bar{\boldsymbol{X}} - \boldsymbol{\mu})'\boldsymbol{\Sigma}^{-1}(\bar{\boldsymbol{X}} - \boldsymbol{\mu})}{(2\pi)^{\frac{np}{2}}|\boldsymbol{\Sigma}|^{\frac{n}{2}}} \tag{i}$$

where *n* is the sample size and *S* is the sample sum of products matrix. In the entire parameter space

$$\Omega = \{ (\mu, \Sigma) \mid \Sigma > O \text{ known}, \ \mu' = (\mu\_1, \dots, \mu\_p) \},$$

the MLE of *μ* is *X*¯ = the sample average. Then

$$\max\_{\Omega} L = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}S)}}{(2\pi)^{\frac{np}{2}}|\Sigma|^{\frac{n}{2}}} \,. \tag{ii}$$

Consider the following hypothesis on *μ*-= *(μ*1*,...,μp)*:

$$H\_o: \mu\_1 = \dots = \mu\_p = \nu, \text{ } \nu \text{ is unknown.}$$

Then, the MLE of *<sup>μ</sup>* under *Ho* is *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>J</sup> <sup>ν</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>J</sup>* <sup>1</sup> *p J* - *X, J* ¯ - = *(*1*,...,* 1*)*. This *ν*ˆ is in fact the sum of all observations on all components of *Xj , j* = 1*,...,n*, divided by *np*, which is identical to the sum of all the coordinates of *<sup>X</sup>*¯ divided by *<sup>p</sup>* or *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *p J J* - *X*¯ . In order to evaluate the maximum of *L* under *Ho*, it suffices to substitute *μ*ˆ to *μ* in *(i)*. Accordingly, the *λ*-criterion is

$$\lambda = \frac{\max\_{H\_o} L}{\max\_{\Omega} L} = \mathbf{e}^{-\frac{n}{2}(\bar{X} - \hat{\mu})'\Sigma^{-1}(\bar{X} - \hat{\mu})}.\tag{6.2.5}$$

Thus, we reject *Ho* for small values of *λ* or for large values of *w* ≡ *n(X*¯ − ˆ*μ)*- *<sup>Σ</sup>*−1*(X*¯ − ˆ*μ)*. Let us determine the distribution of *v*. First, note that

$$
\bar{X} - \hat{\mu} = \bar{X} - \frac{1}{p} J J' \bar{X} = \left( I\_p - \frac{1}{p} J J' \right) \bar{X},
$$

and let

$$\begin{split} w &= n(\bar{X} - \hat{\mu})' \Sigma^{-1} (\bar{X} - \hat{\mu}) = n \bar{X}' (I - \frac{1}{p} J J') \Sigma^{-1} (I - \frac{1}{p} J J') \bar{X} \\ &= (\bar{X} - \mu)' (I - \frac{1}{p} J J') \Sigma^{-1} (I - \frac{1}{p} J J') (\bar{X} - \mu) \end{split} \tag{iiii}$$

since *J* - *(I* <sup>−</sup> <sup>1</sup> *p J J* - *)* = *O*, *μ* = *νJ* being the true mean value of the *Np(μ, Σ)* distribution. Observe that <sup>√</sup>*n(X*¯ <sup>−</sup> *μ)* <sup>∼</sup> *Np(O, Σ), Σ > O*, and that <sup>1</sup> *p J J* is idempotent. Since *<sup>I</sup>* <sup>−</sup> <sup>1</sup> *p J J* is also idempotent and its rank is *p* − 1, there exists an orthonormal matrix *P*, *P P*- = *I, P*- *P* = *I* , such that

$$I - \frac{1}{P} J J' = P' \begin{bmatrix} I\_{p-1} & O \\ O' & 0 \end{bmatrix} P.$$

Letting *<sup>U</sup>* <sup>=</sup> *<sup>P</sup>* <sup>√</sup>*n(X*¯ − ˆ*μ),* with *<sup>U</sup>*- = *(u*1*,...,up*−<sup>1</sup>*, up), U* ∼ *Np(O, P ΣP*- *)*. Now, on noting that

$$U'\begin{bmatrix} I\_{p-1} & O\\ O' & 0 \end{bmatrix} = (u\_1, \dots, u\_{p-1}, 0),$$

we have

$$m(\bar{X} - \hat{\mu})' \Sigma^{-1} (\bar{X} - \hat{\mu}) = [U\_1', 0] P \Sigma^{-1} P' \begin{bmatrix} U\_1 \\ 0 \end{bmatrix} = U\_1' B^{-1} U\_1, \ U\_1' = (u\_1, \dots, u\_{p-1}),$$

*B* being the covariance matrix associated with *U*1, so that *U*<sup>1</sup> ∼ *Np*−1*(O, B), B > O*. Thus, *U*- <sup>1</sup>*B*−1*U*<sup>1</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup>*−1, a real scalar chisquare random variable having *<sup>p</sup>* <sup>−</sup> 1 degrees of freedom. Hence, upon evaluating

$$w = \bar{X}'(I - \frac{1}{p}JJ')\Sigma^{-1}(I - \frac{1}{p}JJ')\bar{X},$$

one would reject *Ho* : *μ*<sup>1</sup> =···= *μp* = *ν, ν* unknown, whenever the observed value of

$$\forall w \ge \chi^2\_{p-1,\,\,a}, \text{ with } Pr\{\chi^2\_{p-1} \ge \chi^2\_{p-1,\,\,a}\} = \alpha. \tag{6.2.6}$$

Observe that the degrees of freedom of this chisquare variable, that is, *p* − 1, coincides with the number of parameters being restricted by *Ho*.

**Example 6.2.6.** Consider the trivariate real Gaussian population *X* ∼ *N*3*(μ, Σ), Σ > O*, as already specified in Example 6.2.1 with the same *Σ* and the same observed sample vectors for testing *Ho* : *μ*-= *(ν, ν, ν)*, namely,

$$\mu = \begin{bmatrix} \mu\_1 \\ \mu\_2 \\ \mu\_3 \end{bmatrix}, \bar{X} = \frac{1}{5} \begin{bmatrix} 9 \\ 4 \\ 3 \end{bmatrix}, \ \Sigma = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 1 & 2 \end{bmatrix} \Rightarrow \Sigma^{-1} = \begin{bmatrix} \frac{1}{2} & 0 & 0 \\ 0 & 2 & -1 \\ 0 & -1 & 1 \end{bmatrix}.$$

The following test statistic has to be evaluated for *p* = 3:

$$w = \bar{X}'(I - \frac{1}{p}JJ')\Sigma^{-1}(I - \frac{1}{p}JJ')\bar{X},\ J' = (1,1,1).$$

We have to evaluate the following quantities in order to determine the value of *w*:

1 3 *J J* - *<sup>Σ</sup>*−<sup>1</sup> <sup>=</sup> <sup>1</sup> 3 ⎡ ⎣ 1 <sup>2</sup> 1 0 1 <sup>2</sup> 1 0 1 <sup>2</sup> 1 0 ⎤ ⎦ *, Σ*−<sup>1</sup> <sup>1</sup> 3 *J J* - <sup>=</sup> <sup>1</sup> 3 ⎡ ⎣ 1 2 1 2 1 2 111 000 ⎤ ⎦ *,* 1 3 *J J* - *<sup>Σ</sup>*−<sup>1</sup> <sup>1</sup> 3 *J J* - <sup>=</sup> <sup>1</sup> 9 ⎡ ⎣ 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 ⎤ <sup>⎦</sup> <sup>=</sup> <sup>1</sup> 6 ⎡ ⎣ 111 111 111 ⎤ ⎦ *,* ⎡ ⎣ 1 <sup>2</sup> 0 0 0 2 −1 0 −1 1 ⎤ <sup>⎦</sup> <sup>−</sup> <sup>1</sup> 3 ⎡ ⎣ 1 <sup>2</sup> 1 0 1 <sup>2</sup> 1 0 1 <sup>2</sup> 1 0 ⎤ <sup>⎦</sup> <sup>−</sup> <sup>1</sup> 3 ⎡ ⎣ 1 2 1 2 1 2 111 000 ⎤ ⎦ + 1 6 ⎡ ⎣ 111 111 111 ⎤ ⎦ = ⎡ ⎣ 1 <sup>3</sup> <sup>−</sup><sup>1</sup> <sup>3</sup> 0 −1 3 3 <sup>2</sup> <sup>−</sup><sup>7</sup> 6 <sup>0</sup> <sup>−</sup><sup>7</sup> 6 7 6 ⎤ <sup>⎦</sup> <sup>=</sup> *(I* <sup>−</sup> <sup>1</sup> 3 *J J* - *)Σ*−1*(I* <sup>−</sup> <sup>1</sup> 3 *J J* - *).*

Thus,

$$\begin{split} w &= \bar{X}'(I - \frac{1}{3}JJ')\Sigma^{-1}(I - \frac{1}{3}JJ')\bar{X}, J' = (1, 1, 1) \\ &= \frac{1}{5^2} \begin{bmatrix} 9 & 4 & 3 \end{bmatrix} \begin{bmatrix} \frac{1}{3} & -\frac{1}{3} & 0 \\ -\frac{1}{3} & \frac{3}{2} & -\frac{7}{6} \\ 0 & -\frac{7}{6} & \frac{7}{6} \end{bmatrix} \begin{bmatrix} 9 \\ 4 \\ 3 \end{bmatrix} \\ &= \frac{1}{5^2} \begin{bmatrix} 9^2\frac{1}{3} + 4^2\frac{3}{2} + 3^2\frac{7}{6} - \frac{2}{3} & (9)(4) - \frac{14}{6}(4)(3) \\ = 0.38. \end{bmatrix} \end{split}$$

We reject *Ho* whenever *<sup>w</sup>* <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup>*−1*,α*. Letting the significance level be *<sup>α</sup>* <sup>=</sup> <sup>0</sup>*.*05, the tabulated critical value is *χ*<sup>2</sup> *<sup>p</sup>*−1*, α* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> <sup>2</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 5*.*99, and since 0*.*38 *<* 5*.*99*,* we do not reject the null hypothesis. In this instance, the *<sup>p</sup>*-value is *P r*{*χ*<sup>2</sup> <sup>2</sup> ≥ 0*.*38} ≈ 0*.*32.

# **6.3. Testing** *Ho* : *μ* = *μo* **(given) When** *Σ* **is Unknown, Real Gaussian Case**

In this case, both *μ* and *Σ* are unknown in the entire parameter space ; however, *μ* = *μo* known while *Σ* is still unknown in the subspace *ω*. The MLE under is the same as that obtained in Sect. 6.1.1., that is,

$$\sup\_{\Omega} L = \frac{\mathbf{e}^{-\frac{np}{2}} n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} |\mathbb{S}|^{\frac{n}{2}}}.\tag{6.3.1}$$

When *<sup>μ</sup>* <sup>=</sup> *μo*, *<sup>Σ</sup>* is estimated by *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *n <sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(Xj* <sup>−</sup> *μo)(Xj* <sup>−</sup> *μo)*- . As shown in Sect. 3.5, *Σ*ˆ can be reexpressed as follows:

$$\begin{split} \hat{\Sigma} &= \frac{1}{n} \sum\_{j=1}^{n} (X\_j - \mu\_o)(X\_j - \mu\_o) = \frac{1}{n} \mathcal{S} + (\bar{X} - \mu\_o)(\bar{X} - \mu\_o)' \\ &= \frac{1}{n} [\mathcal{S} + n(\bar{X} - \mu\_o)(\bar{X} - \mu\_o)']. \end{split}$$

Then, under the null hypothesis, we have

$$\sup\_{\alpha} L = \frac{\mathbf{e}^{-\frac{np}{2}} n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} |S + n(\bar{X} - \mu\_o)(\bar{X} - \mu\_o)'|^{\frac{n}{2}}}.\tag{6.3.2}$$

Thus,

$$\lambda = \frac{\sup\_{\boldsymbol{\alpha}} \boldsymbol{L}}{\sup\_{\Omega} \boldsymbol{L}} = \frac{|\boldsymbol{S}|^{\frac{n}{2}}}{|\boldsymbol{S} + n(\bar{\boldsymbol{X}} - \mu\_o)(\bar{\boldsymbol{X}} - \mu\_o)'|^{\frac{n}{2}}}.$$

On applying results on the determinants of partitioned matrices which were obtained in Sect. 1.3, we have the following equivalent representations of the denominator:

$$\begin{aligned} \left| \begin{bmatrix} S & n(\bar{X} - \mu\_o) \\ - (\bar{X} - \mu\_o)' & 1 \end{bmatrix} \right| &= [1] |S + n(\bar{X} - \mu\_o)(\bar{X} - \mu\_o)'| \\ &= |S| \left| 1 + n(\bar{X} - \mu\_o)' S^{-1} (\bar{X} - \mu\_o) \right|, \end{aligned}$$

that is,

$$|S + n(\bar{X} - \mu\_o)(\bar{X} - \mu\_o)'| = |S|[1 + n(\bar{X} - \mu\_o)'\mathcal{S}^{-1}(\bar{X} - \mu\_o)],$$

which yields the following simplified representation of the likelihood ratio statistic:

$$\lambda = \frac{1}{[1 + n(\bar{X} - \mu\_o)'\mathcal{S}^{-1}(\bar{X} - \mu\_o)]^{\frac{n}{2}}}.\tag{6.3.3}$$

Small values of *λ* correspond to large values of *u* ≡ *n(X*¯ − *μo)*- *<sup>S</sup>*−1*(X*¯ <sup>−</sup> *μo)*, which is connected to Hotelling's *T* <sup>2</sup> *<sup>n</sup>* statistic. Hence the criterion is the following: "Reject *Ho* for large values of *u*". The distribution of *u* can be derived by making use of the independence of the sample mean and sample sum of products matrix and the densities of these quantities. An outline of the derivation is provided in the next subsection.

#### **6.3.1. The distribution of the test statistic**

Let us examine the distribution of *u* = *n(X*¯ − *μ)*- *<sup>S</sup>*−1*(X*¯ <sup>−</sup> *μ)*. We have already established in Theorems 3.5.3, that *S* and *X*¯ are independently distributed in the case of a real *p*-variate nonsingular Gaussian *Np(μ, Σ)* population. It was also determined in Corollary 3.5.1 that the distribution of the sample average *X*¯ is a *p*-variate real Gaussian vector with the parameters *μ* and <sup>1</sup> *<sup>n</sup>Σ, Σ > O* and in the continuing discussion, it is shown that the distribution of *S* is a matrix-variate Wishart with *m* = *n* − 1 degrees of freedom, where *n* is the sample size and parameter matrix *Σ>O*. Hence the joint density of *S* and *X*¯ , denoted by *f (S, X)*¯ , is the product of the marginal densities. Letting *Σ* = *I* , this joint density is given by

$$f(S, \bar{X}) = \frac{n^{\frac{p}{2}}}{(2\pi)^{\frac{p}{2}}2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})}|S|^{\frac{m}{2}-\frac{p+1}{2}}e^{-\frac{1}{2}\text{tr}(S)-\frac{n}{2}\text{tr}((\bar{X}-\mu)(\bar{X}-\mu)')}, \; m=n-1. \qquad (i)$$

Note that it is sufficient to consider the case *<sup>Σ</sup>* <sup>=</sup> *<sup>I</sup>* . Due to the presence of *<sup>S</sup>*−<sup>1</sup> in *<sup>u</sup>* <sup>=</sup> *(X*¯ − *μ)*- *<sup>S</sup>*−1*(X*¯ <sup>−</sup> *μ)*, the effect of any scaling matrix on *Xj* will disappear. If *Xj* goes to *A* 1 <sup>2</sup>*Xj* for any constant positive definite matrix *<sup>A</sup>* then *<sup>S</sup>*−<sup>1</sup> will go to *<sup>A</sup>*−<sup>1</sup> <sup>2</sup> *S*−1*A*−<sup>1</sup> <sup>2</sup> and thus *u* will be free of *A*.

Letting *<sup>Y</sup>* <sup>=</sup> *<sup>S</sup>*−<sup>1</sup> <sup>2</sup> *(X*¯ <sup>−</sup>*μ)* for fixed *<sup>S</sup>*, *<sup>Y</sup>* <sup>∼</sup> *Np(O, S*−1*/n),* so that the conditional density of *Y* , given *S*, is

$$\lg(Y|S) = \frac{n^{\frac{p}{2}}|S|^{\frac{1}{2}}}{(2\pi)^{\frac{p}{2}}} \mathbf{e}^{-\frac{n}{2}\text{tr}(SYY')}.$$

Thus, the joint density of *S* and *Y* , denoted by *f*1*(S, Y )*, is

$$f\_1(S,Y) = \frac{n^{\frac{p}{2}}}{(2\pi)^{\frac{p}{2}}2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})}|S|^{\frac{m+1}{2}-\frac{p+1}{2}}e^{-\frac{1}{2}\text{tr}(S[I+nYY'])}, \quad m=n-1. \tag{ii}$$

On integrating out *S* from *(ii)* by making use of a matrix-variate gamma integral, we obtain the following marginal density of *Y* , denoted by *f*2*(Y )*:

$$f\_2(Y) \mathrm{d}Y = \frac{n^{\frac{p}{2}}}{(\pi)^{\frac{p}{2}}} \frac{\Gamma\_p(\frac{m+1}{2})}{\Gamma\_p(\frac{m}{2})} |I + nYY'|^{-(\frac{m+1}{2})} \mathrm{d}Y, \quad m = n - 1. \tag{iii}$$

However, |*I* + *nY Y* - | = 1 + *nY* - *Y,* which can be established by considering two representations of the determinant

$$
\begin{vmatrix} I & -\sqrt{n}Y \\ \sqrt{n}Y' & 1 \end{vmatrix},
$$

similarly to what was done in Sect. 6.3 to obtain the likelihood ratio statistic given in (6.3.3). As well, it can easily be shown that

$$\frac{\Gamma\_p(\frac{m+1}{2})}{\Gamma\_p(\frac{m}{2})} = \frac{\Gamma(\frac{m+1}{2})}{\Gamma(\frac{m+1}{2} - \frac{p}{2})}$$

by expanding the matrix-variate gamma functions. Now, letting *s* = *Y* - *Y* , it follows from Theorem 4.2.3 that d*<sup>Y</sup>* <sup>=</sup> *<sup>π</sup> p* 2 *Γ ( <sup>p</sup>* 2 *) s p* <sup>2</sup> <sup>−</sup>1d*s*. Thus, the density of *s*, denoted by *f*3*(s)*, is

$$f\_{\mathbf{j}}(\mathbf{s})\mathbf{ds} = \frac{n^{\frac{p}{2}}\Gamma(\frac{m+1}{2})}{\Gamma(\frac{m+1}{2} - \frac{p}{2})\Gamma(\frac{p}{2})}\mathbf{s}^{\frac{p}{2}-1}(1+ns)^{-(\frac{m+1}{2})}\mathbf{ds} \tag{6.3.4}$$

$$\xi = \frac{n^{\frac{p}{2}}\Gamma(\frac{n}{2})}{\Gamma(\frac{n}{2} - \frac{p}{2})\Gamma(\frac{p}{2})}s^{\frac{p}{2}-1}(1+ns)^{-(\frac{n}{2})}\text{ds, }m = n-1,\tag{6.3.5}$$

for *n* = *p* + 1*, p* + 2*,...,* 0 ≤ *s <* ∞*,* and zero elsewhere. It can then readily be seen from (6.3.5) that *ns* = *nY* - *Y* = *n(X*¯ − *μ)*- *<sup>S</sup>*−1*(X*¯ <sup>−</sup> *μ)* <sup>=</sup> *<sup>u</sup>* is distributed as a real scalar type-2 beta random variable whose parameters are *(<sup>p</sup>* <sup>2</sup> *, <sup>n</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>* <sup>2</sup> *), n* = *p* + 1*,...* . Thus, the following result:

**Theorem 6.3.1.** *Consider a real p-variate normal population Np(μ, Σ), Σ > O, and a simple random sample of size n from this normal population, Xj* ∼ *Np(μ, Σ), j* = 1*, . . . , n, the Xj 's being independently distributed. Let the p* × *n matrix* **X** = *(X*1*,..., Xn) be the sample matrix and the <sup>p</sup>-vector <sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn) denote the sample average. Let* **X**¯ = *(X,... ,* ¯ *X)*¯ *be a p* × *n matrix whose columns are all equal to X,* ¯ *and S* = *(***X** − **X**¯ *)(***X** − **X**¯ *) be the sample sum of products matrix. Then, u* = *n(X*¯ − *μ)*- *<sup>S</sup>*−1*(X*¯ <sup>−</sup>*μ) has a real scalar type-2 beta distribution with the parameters (<sup>p</sup>* <sup>2</sup> *, <sup>n</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>* 2 *), so that <sup>u</sup>* <sup>∼</sup> *<sup>p</sup> n*−*p Fp, n*−*<sup>p</sup> where Fp, n*−*<sup>p</sup> denotes a real F random variable whose degrees of freedoms are p and n* − *p.*

Hence, in order to test the hypothesis *Ho* : *μ* = *μo*, the likelihood ratio statistic gives the test criterion: Reject *Ho* for large values of *u* = *n(X*¯ − *μo)*- *<sup>S</sup>*−1*(X*¯ <sup>−</sup> *μo)*, which is equivalent to rejecting *Ho* for large values of an *F*-random variable having *p* and *n* − *p* degrees of freedom where *Fp, n*−*<sup>p</sup>* <sup>=</sup> *<sup>n</sup>*−*<sup>p</sup> p <sup>u</sup>* <sup>=</sup> *<sup>n</sup>*−*<sup>p</sup> <sup>p</sup> n(X*¯ − *μo)*- *<sup>S</sup>*−1*(X*¯ <sup>−</sup> *μo)*, that is,

$$\begin{aligned} \text{reject } H\_o \text{ if } \frac{n-p}{p} u &= F\_{p, n-p} \ge F\_{p, n-p, a} \text{ a} \\ \text{with } a &= \Pr\{F\_{p, n-p} \ge F\_{n, n-p, a}\} \end{aligned} \tag{6.3.6}$$

at a given significance level *α* where *u* = *n(X*¯ −*μo)*- *<sup>S</sup>*−1*(X*¯ <sup>−</sup>*μo)* <sup>∼</sup> *<sup>p</sup> n*−*p Fp, n*−*<sup>p</sup>* , *n* being the sample size.

**Example 6.3.1.** Consider a trivariate real Gaussian vector *X* ∼ *N*3*(μ, Σ), Σ > O,* where *Σ* is unknown. We would like to test the following hypothesis on *μ*: *Ho* : *μ* = *μo,* with *μ*- *<sup>o</sup>* = *(*1*,* 1*,* 1*)*. Consider the following simple random sample of size *n* = 5 from this *N*3*(μ, Σ)* population:

$$X\_1 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \ X\_2 = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \ X\_3 = \begin{bmatrix} -1 \\ 1 \\ 2 \end{bmatrix}, \ X\_4 = \begin{bmatrix} -2 \\ 1 \\ 2 \end{bmatrix}, \ X\_5 = \begin{bmatrix} 2 \\ -1 \\ 0 \end{bmatrix},$$

so that

$$\begin{aligned} \bar{X} &= \frac{1}{5} \begin{bmatrix} 1 \\ 2 \\ 4 \end{bmatrix}, \ X\_1 - \bar{X} = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} - \frac{1}{5} \begin{bmatrix} 1 \\ 2 \\ 4 \end{bmatrix} = \frac{1}{5} \begin{bmatrix} 4 \\ 3 \\ 1 \end{bmatrix}, \ X\_2 - \bar{X} = \frac{1}{5} \begin{bmatrix} 4 \\ -2 \\ -9 \end{bmatrix}, \\\ X\_3 - \bar{X} &= \frac{1}{5} \begin{bmatrix} -6 \\ 3 \\ 6 \end{bmatrix}, \ X\_4 - \bar{X} = \frac{1}{5} \begin{bmatrix} -11 \\ 3 \\ 6 \end{bmatrix}, \ X\_5 - \bar{X} = \frac{1}{5} \begin{bmatrix} 9 \\ -7 \\ -4 \end{bmatrix}. \end{aligned}$$

Let **X** = [*X*1*,...,X*5] the 3×5 sample matrix and **X**¯ = [*X,* ¯ *X,... ,* ¯ *X*¯ ] be the 3×5 matrix of sample means. Then,

$$\begin{aligned} \mathbf{X} - \bar{\mathbf{X}} &= [X\_1 - \bar{X}, \dots, X\_5 - \bar{X}] = \frac{1}{5} \begin{bmatrix} 4 & 4 & -6 & -11 & 9 \\ 3 & -2 & 3 & 3 & -7 \\ 1 & -9 & 6 & 6 & -4 \end{bmatrix}, \\ \mathbf{S} &= (\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})' = \frac{1}{5^2} \begin{bmatrix} 270 & -110 & -170 \\ -110 & 80 & 85 \\ -170 & 85 & 170 \end{bmatrix}. \end{aligned}$$

Let *<sup>S</sup>* <sup>=</sup> <sup>1</sup> <sup>52</sup>*A*. In order to evaluate the test statistic, we need *<sup>S</sup>*−<sup>1</sup> <sup>=</sup> <sup>25</sup>*A*−1. To obtain the correct inverse without any approximation, we will use the transpose of the cofactor matrix divided by the determinant. The determinant of *A*, |*A*|, as obtained in terms of the elements of the first row and the corresponding cofactors is equal to 531250. The matrix of cofactors, denoted by Cof*(A)*, which is symmetric in this case, is the following:

$$\text{Cof}(A) = \begin{bmatrix} 6375 & 4250 & 4250 \\ 4250 & 17000 & -4250 \\ 4250 & -4250 & 9500 \end{bmatrix} \Rightarrow S^{-1} = \frac{25}{531250} \text{Cof}(A) \dots$$

#### Hypothesis Testing and Null Distributions

The null hypothesis is *Ho* : *μ* = *μo* = *(*1*,* 1*,* 1*)*- *,* so that

$$
\bar{X} - \mu\_o = \frac{1}{5} \begin{bmatrix} 1 \\ 2 \\ 4 \end{bmatrix} - \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} = -\frac{1}{5} \begin{bmatrix} 4 \\ 3 \\ 1 \end{bmatrix},
$$

the observed value of the test statistic being

$$\begin{split} w &= \frac{n-p}{p} n (\bar{X} - \mu\_o)' S^{-1} (\bar{X} - \mu\_o) = \frac{(5-3)}{3} \, 5 \, \frac{25}{S^2} \begin{bmatrix} 4 & 3 & 1 \end{bmatrix} A^{-1} \begin{bmatrix} 4 \\ 3 \\ 1 \end{bmatrix} \\ &= \frac{2}{3} \, 5 \, \frac{25}{S^2} \frac{1}{531250} [(4)^2 (6375) + (3)^2 (17000) + (1)^2 (9500) \\ &\qquad \qquad \qquad + 2(4)(3)(4250) + 2(4)(1)(4250) - 2(3)(1)(4250)] \\ &= \frac{2}{3} \, 5 \, \frac{25}{S^2} \frac{1}{531250} [375000] = 2.35. \end{split}$$

The test statistic *w* under the null hypothesis is *F*-distributed, that is, *w* ∼ *Fp, n*−*p*. Let us test *Ho* at the significance level *α* = 0*.*05. Since the critical value as obtained from an *F*-table is *Fp,n*−*p, α* = *F*3*,*2*,* <sup>0</sup>*.*<sup>05</sup> = 19*.*2 and 2*.*35 *<* 19*.*2, we do not reject *Ho*.

**Note 6.3.1.** If *S* is replaced by <sup>1</sup> *<sup>n</sup>*−1*S*, an unbiased estimator for *<sup>Σ</sup>*, then the test statistic 1 *<sup>n</sup>*−1*n(X*¯ <sup>−</sup>*μo)*- [ 1 *<sup>n</sup>*−1*S*] <sup>−</sup>1*(X*¯ <sup>−</sup>*μo)* <sup>=</sup> *<sup>T</sup>* <sup>2</sup> *n <sup>n</sup>*−<sup>1</sup> where *<sup>T</sup>* <sup>2</sup> *<sup>n</sup>* denotes Hotelling's *T* <sup>2</sup> statistic, which for *p* = 1 corresponds to the square of a Student-*t* statistic having *n*−1 degrees of freedom.

Since *u* as defined in Theorem 6.3.1 is distributed as a type-2 beta random variable with the parameters *(<sup>p</sup>* <sup>2</sup> *, <sup>n</sup>*−*<sup>p</sup>* <sup>2</sup> *),* we have the following results: <sup>1</sup> *<sup>u</sup>* is type-2 beta distributed with the parameters *(n*−*<sup>p</sup>* <sup>2</sup> *, <sup>p</sup>* <sup>2</sup> *)*, *<sup>u</sup>* <sup>1</sup>+*<sup>u</sup>* is type-1 beta distributed with the parameters *(<sup>p</sup>* <sup>2</sup> *, <sup>n</sup>*−*<sup>p</sup>* <sup>2</sup> *)*, and <sup>1</sup> <sup>1</sup>+*<sup>u</sup>* is type-1 beta distributed with the parameters *(n*−*<sup>p</sup>* <sup>2</sup> *, <sup>p</sup>* <sup>2</sup> *)*, *n* being the sample size.

#### **6.3.2. Paired values or linear functions when** *Σ* **is unknown**

Let *Y*1*,...,Yk* be *p* × 1 vectors having their own distributions which are unknown. However, suppose that it is known that a certain linear function *X* = *a*1*Y*<sup>1</sup> +···+*akYk* has a *p*-variate real Gaussian *Np(μ, Σ)* distribution with *Σ>O*. We would like to test hypotheses of the type *<sup>E</sup>*[*X*] = *<sup>a</sup>*1*μ(o) (*1*)*+· · ·+*akμ(o) (k)* where the *μ(o) <sup>j</sup>* 's, *j* = 1*,... , k,* are specified. Since we do not know the distributions of *Y*1*,...,Yk*, let us convert the iid variables on *Yj , j* = 1*,... , k,* to iid variables on *Xj* , say *X*1*,...,Xn*, *Xj* ∼ *Np(μ, Σ), Σ > O,* where *Σ* is unknown. First, the observations on *Y*1*,...,Yk* are transformed into observations on the *Xj* 's. The problem then involves a single normal population whose covariance matrix is unknown. An example of this type is *Y*<sup>1</sup> representing a *p* × 1 vector before a certain process, such as administering a drug to a patient; in this instance, *Y*<sup>1</sup> could consists of measurements on *p* characteristics observed in a patient. Observations on *Y*<sup>2</sup> will then be the measurements on the same *p* characteristics after the process such as after administering the drug to the patient. Then *Y*2*<sup>q</sup>* −*Y*1*<sup>q</sup>* = *Xq* will represent the variable corresponding to the difference in the measurements on the *q*-th characteristic. Let the hypothesis be *Ho* : *μ* = *μo* (given), *Σ* being unknown. Note that once the observations on *Xj* are taken, then the individual *μ(j )*'s are irrelevant as they no longer are of any use. Once the *Xj* 's are determined, one can compute the sum of products matrix *S* in *Xj* . In this case, the test statistic is *u* = *n(X*¯ − *μo)*- *<sup>S</sup>*−1*(X*¯ <sup>−</sup> *μo),* which is distributed as a type-2 beta with parameters *(<sup>p</sup>* <sup>2</sup> *, <sup>n</sup>*−*<sup>p</sup>* <sup>2</sup> *)*. Then, *<sup>u</sup>* <sup>∼</sup> *<sup>p</sup> n*−*p F* where *F* is an *F* random variable having *p* and *n* − *p* degrees of freedom, that is, an *Fp, n*−*<sup>p</sup>* random variable, *n* being the sample size. Thus, the test criterion is applied as follows: Determine the observed value of *u* and the corresponding observed value of *Fp,n*−*<sup>p</sup>* that is, *<sup>n</sup>*−*<sup>p</sup> p u*, and then

$$\text{reject } H\_o \text{ if } \frac{n-p}{p} \,\mu \ge F\_{p, n-p, a}, \text{ with } Pr\{F\_{p, n-p} \ge F\_{p, n-p, a}\} = \alpha. \tag{6.3.7}$$

**Example 6.3.2.** Five motivated individuals were randomly selected and subjected to an exercise regimen for a month. The exercise program promoters claim that the subjects can expect a weight loss of 5 kg as well as a 2-in. reduction in lower stomach girth by the end of the month period. Let *Y*<sup>1</sup> and *Y*<sup>2</sup> denote the two component vectors representing weight and girth before starting the routine and at the end of the exercise program, respectively. The following are the observations on the five individuals:

$$(Y\_1, Y\_2) = \begin{bmatrix} (85, 85) \\ (40, 41) \end{bmatrix}, \quad \begin{bmatrix} (80, 70) \\ (40, 45) \end{bmatrix}, \quad \begin{bmatrix} (75, 73) \\ (36, 36) \end{bmatrix}, \quad \begin{bmatrix} (70, 71) \\ (38, 38) \end{bmatrix}, \quad \begin{bmatrix} (70, 68) \\ (35, 34) \end{bmatrix}.$$

Obviously, *Y*<sup>1</sup> and *Y*<sup>2</sup> are dependent variables having a joint distribution. We will assume that the difference *X* = *Y*<sup>1</sup> − *Y*<sup>2</sup> has a real Gaussian distribution, that is, *X* ∼ *N*2*(μ, Σ), Σ > O*. Under this assumption, the observations on *X* are

$$X\_1 = \begin{bmatrix} 85 - 85 \\ 40 - 41 \end{bmatrix} = \begin{bmatrix} 0 \\ -1 \end{bmatrix}, \ X\_2 = \begin{bmatrix} 10 \\ -5 \end{bmatrix}, \ X\_3 = \begin{bmatrix} 2 \\ 0 \end{bmatrix}, \ X\_4 = \begin{bmatrix} -1 \\ 0 \end{bmatrix}, \ X\_5 = \begin{bmatrix} 2 \\ 1 \end{bmatrix}.$$

Let **X** = [*X*1*, X*2*,...,X*5] and **X** − **X**¯ = [*X*<sup>1</sup> − *X,... , X* ¯ <sup>5</sup> − *X*¯ ], both being 2 × 5 matrix. The observed sample average *X*¯ , the claim of the exercise routine promoters *μ* = *μo* as well as other relevant quantities are as follows:

*<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> 5 *(X*<sup>1</sup> +···+ *<sup>X</sup>*5*)* <sup>=</sup> <sup>1</sup> 5 13 −5 *, μo* = 5 2 *, X*<sup>1</sup> − *X*¯ = 0 −1 − 1 5 13 −5 = 1 5 −13 0 *, <sup>X</sup>*<sup>2</sup> <sup>−</sup> *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> 5 37 −20 *, X*<sup>3</sup> <sup>−</sup> *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> 5 −3 5 *, X*<sup>4</sup> <sup>−</sup> *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> 5 −18 5 *, X*<sup>5</sup> <sup>−</sup> *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> 5 −3 10 ; **X** − **X**¯ = [*X*<sup>1</sup> − *X,... , X* ¯ <sup>5</sup> − *X*¯ ] = 1 5 −13 37 −3 −18 −3 <sup>0</sup> <sup>−</sup>20 5 5 10 ; *S* = *(***X** − **X**¯ *)(***X** − **X**¯ *)* - <sup>=</sup> <sup>1</sup> 52 −13 37 −3 −18 −3 <sup>0</sup> <sup>−</sup>20 5 5 10 ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ −13 0 37 −20 −3 5 −18 5 −3 10 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ = 1 52 1880 −875 <sup>−</sup>875 550 = 1 <sup>52</sup>*A, A* <sup>=</sup> 1880 −875 <sup>−</sup>875 550 <sup>⇒</sup> *<sup>S</sup>*−<sup>1</sup> <sup>=</sup> <sup>25</sup>*A*−1; |*A*| = 1880*(*550*)* − *(*875*)* <sup>2</sup> <sup>=</sup> <sup>214975</sup>; Cof*(A)* <sup>=</sup> 550 875 875 1880 ; *<sup>A</sup>*−<sup>1</sup> <sup>=</sup> Cof*(A)* <sup>|</sup>*A*<sup>|</sup> ; *<sup>A</sup>*−<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>268375</sup> 550 875 875 1880 *, S*−<sup>1</sup> <sup>=</sup> <sup>25</sup>*A*−1; *<sup>X</sup>*¯ <sup>−</sup> *μo* <sup>=</sup> <sup>1</sup> 5 13 −5 − 5 2 = −<sup>1</sup> 5 12 15 *.*

The test statistic being *<sup>w</sup>* <sup>=</sup> *(n*−*p) <sup>p</sup> n(X*¯ − *μo)*- *<sup>S</sup>*−1*(X*¯ <sup>−</sup> *μo)*, its observed value is

$$\begin{split} v &= \frac{(5-2)}{2} \mathfrak{F} \frac{\mathfrak{F}^2}{\mathfrak{S}^2} \frac{1}{268375} \begin{bmatrix} 12 & 15 \end{bmatrix} \begin{bmatrix} 550 & 875 \\ 875 & 1880 \end{bmatrix} \begin{bmatrix} 12 \\ 15 \end{bmatrix} \\ &= \frac{3}{2} \frac{\mathfrak{F}}{268375} [(12)^2(550) + (15)^2(1880) + 2(12)(15)(875)] = 22.84... \end{split}$$

Letting the significance level of the test be *α* = 0*.*05, the critical value is *Fp, n*−*p, α* = *F*2*,* <sup>3</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 9*.*55. Since 22*.*84 *>* 9*.*55, *Ho* is thus rejected.

#### **6.3.3. Independent Gaussian populations**

Consider *k* independent *p*-variate real Gaussian populations whose individual distribution is *Np(μ(j ), Σj ), Σj > O, j* = 1*,...,k*. Given simple random samples of sizes *n*1*,...,nk* from these *k* populations, we may wish to test a hypothesis on a given linear functions of the mean values, that is, *Ho* : *a*1*μ(*1*)* +···+ *akμ(k)* = *μ*<sup>0</sup> where *a*1*,...,ak* are known constants and *μo* is a given quantity under the null hypothesis. We have already discussed this problem for the case of known covariance matrices. When the *Σj* 's are all unknown or some of them are known while others are not, the MLE's of the unknown covariance matrices turn out to be the respective sample sums of products matrices divided by the corresponding sample sizes. This will result in a linear function of independent Wishart matrices whose distribution proves challenging to determine, even for the null case.

#### **Special case of two independent Gaussian populations**

Consider the special case of two independent real Gaussian populations having identical covariance matrices. that is, let the populations be *Y*1*<sup>q</sup>* ∼ *Np(μ(*1*), Σ), Σ > O*, the *Y*1*<sup>q</sup>* 's, *q* = 1*,...,n*1*,* being iid, and *Y*2*<sup>q</sup>* ∼ *Np(μ(*2*), Σ), Σ > O*, the *Y*2*<sup>q</sup>* 's, *q* = 1*,...,n*2*,* being iid . Let the sample *p* × *n*<sup>1</sup> and *p* × *n*<sup>2</sup> matrices be denoted as **Y**<sup>1</sup> = *(Y*11*,...,Y*1*n*<sup>1</sup> *)* and **Y**<sup>2</sup> = *(Y*21*,...,Y*2*n*<sup>2</sup> *)* and let the sample averages be *Y*¯ *<sup>j</sup>* <sup>=</sup> <sup>1</sup> *nj (Yj*<sup>1</sup> +···+ *Yj nj ), j* = 1*,* 2. Let **Y**¯ *<sup>j</sup>* = *(Y*¯ *<sup>j</sup> ,..., Y*¯ *<sup>j</sup> )*, a *p* × *nj* matrix whose columns are equal to *Y*¯ *<sup>j</sup>* , *j* = 1*,* 2*,* and let

$$S\_j = (\mathbf{Y}\_j - \bar{\mathbf{Y}}\_j)(\mathbf{Y}\_j - \bar{\mathbf{Y}}\_j)', \ j = 1, 2, 3$$

be the corresponding sample sum of products matrices. Then, *S*<sup>1</sup> and *S*<sup>2</sup> are independently distributed as Wishart matrices having *n*<sup>1</sup> −1 and *n*<sup>2</sup> −1 degrees of freedom, respectively. As the sum of two independent *p* × *p* real or complex matrices having matrix-variate gamma distributions with the same scale parameter matrix is again gamma distributed with the shape parameters summed up and the same scale parameter matrix, we observe that since the two populations are independently distributed, *S*<sup>1</sup> + *S*<sup>2</sup> ≡ *S* has a Wishart distribution having *n*<sup>1</sup> + *n*<sup>2</sup> − 2 degrees of freedom. We now consider a hypothesis of the type *μ(*1*)* = *μ(*2*)*. In order to do away with the unknown common mean value, we may consider the real *p*-vector *U* = *Y*¯ <sup>1</sup> −*Y*¯ <sup>2</sup>*,* so that *E(U )* <sup>=</sup> *<sup>O</sup>* and Cov*(U )* <sup>=</sup> <sup>1</sup> *n*1 *<sup>Σ</sup>* <sup>+</sup> <sup>1</sup> *n*2 *Σ* = *(* 1 *<sup>n</sup>*<sup>1</sup> <sup>+</sup> <sup>1</sup> *n*2 *)Σ* <sup>=</sup> *<sup>n</sup>*1+*n*<sup>2</sup> *n*1*n*2 *Σ*. The MLE of this pooled covariance matrix is <sup>1</sup> *n*1+*n*<sup>2</sup> *S* where *S* is Wishart distributed with *n*<sup>1</sup> + *n*<sup>2</sup> − 2 degrees of freedom. Then, following through the steps included in Sect. 6.3.1 with the parameter *m* now being *n*<sup>1</sup> + *n*<sup>2</sup> − 2, the power of *S* will become *(n*1+*n*2−2+1*)* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> when integrating out *S*. Letting the null hypothesis be *Ho* : *E*[*Y*1] −*E*[*Y*2] = *δ* (specified), such as *δ* = 0, the function resulting from integrating out *S* is

$$c \left[ \frac{(n\_1 + n\_2)}{n\_1 n\_2} u \right]^{\frac{p}{2} - 1} [1 + \frac{(n\_1 + n\_2)}{n\_1 n\_2} u]^{-\frac{1}{2}(n\_1 + n\_2 - 1)}\tag{6.3.8}$$

where *<sup>c</sup>* is the normalizing constant, so that *<sup>w</sup>* <sup>=</sup> *(n*1+*n*2*) <sup>n</sup>*1*n*<sup>2</sup> *(Y*¯ <sup>1</sup> − *Y*¯ <sup>2</sup> − *δ)*- *S*−1*(Y*¯ <sup>1</sup> − *Y*¯ 2 − *δ)* is distributed as a type-2 beta with the parameters *(<sup>p</sup>* <sup>2</sup> *, (n*1+*n*2−1−*p)* <sup>2</sup> *)*. Writing *<sup>w</sup>* <sup>=</sup> *<sup>p</sup> n*1+*n*2−1−*p Fp,n*1+*n*2−1−*p*, this *F* is seen to be an *F* statistic having *p* and *n*<sup>1</sup> + *n*<sup>2</sup> − 1 − *p* degrees of freedom. We will state these results as theorems.

**Theorem 6.3.2.** *Let the p*×*p real positive definite matrices X*<sup>1</sup> *and X*<sup>2</sup> *be independently distributed as real matrix-variate gamma random variables with densities*

$$f\_j(X\_j) = \frac{|B|^{a\_j}}{\Gamma\_p(a\_j)} |X\_j|^{a\_j - \frac{p+1}{2}} e^{-\text{tr}(BX\_j)}, \ B > O, \ X\_j > O, \ \Re(a\_j) > \frac{p-1}{2}, \tag{6.3.9}$$

*j=1,2, and zero elsewhere. Then, as can be seen from (5.2.6), the Laplace transform associated with Xj or that of fj , denoted as LXj (*∗*T ), is*

$$L\_{X\_j}(\*T) = |I + B^{-1} \_\*T|^{-a\_j}, \ I + B^{-1} \_\*T > O, \ j = 1, 2. \tag{i}$$

*Accordingly, U*<sup>1</sup> = *X*<sup>1</sup> + *X*<sup>2</sup> *has a real matrix-variate gamma density with the parameters (α*<sup>1</sup> + *α*2*, B) whose associated Laplace transform is*

$$L\_{U\_1}(\*T) = |I + B^{-1} \_\*T|^{-(a\_1 + a\_2)},\tag{ii}$$

*and U*<sup>2</sup> = *a*1*X*<sup>1</sup> + *a*2*X*<sup>2</sup> *has the Laplace transform*

$$L\_{U\_2}(\,\_\*T) = |I + a\_1 \boldsymbol{B}^{-1} \,\_\*T|^{-\alpha\_1} |I + a\_2 \boldsymbol{B}^{-1} \,\_\*T|^{-\alpha\_2},\tag{iii}$$

*whenever <sup>I</sup>* <sup>+</sup> *ajB*−<sup>1</sup> <sup>∗</sup>*T > O, j* = 1*,* 2*, where a*<sup>1</sup> *and a*<sup>2</sup> *are real scalar constants.*

It follows from *(ii)* that *X*<sup>1</sup> + *X*<sup>2</sup> is also real matrix-variate gamma distributed. When *a*1 = *a*2, it is very difficult to invert *(iii)* in order to obtain the corresponding density. This can be achieved by expanding one of the determinants in *(iii)* in terms of zonal polynomials, say the second one, after having first taken <sup>|</sup>*<sup>I</sup>* <sup>+</sup> *<sup>a</sup>*1*B*−<sup>1</sup> <sup>∗</sup>*T* | <sup>−</sup>*(α*1+*α*2*)* out as a factor in this instance.

**Theorem 6.3.3.** *Let Yj* ∼ *Np(μ(j ), Σ), Σ > O, j* = 1*,* 2*, be independent p-variate real Gaussian distributions sharing the same covariance matrix. Given a simple random sample of size n*<sup>1</sup> *from Y*<sup>1</sup> *and a simple random sample of size n*<sup>2</sup> *from Y*2*, let the sample averages be denoted by Y*¯ <sup>1</sup> *and Y*¯ <sup>2</sup> *and the sample sums of products matrices, by S*<sup>1</sup> *and S*2*, respectively. Consider the hypothesis Ho* : *μ(*1*)* − *μ(*2*)* = *δ (given). Letting S* = *S*<sup>1</sup> + *S*<sup>2</sup> *and*

$$w = \frac{(n\_1 + n\_2)}{n\_1 n\_2} (\bar{Y}\_1 - \bar{Y}\_2 - \delta)' S^{-1} (\bar{Y}\_1 - \bar{Y}\_2 - \delta), \ w \sim \frac{p}{n\_1 + n\_2 - 1 - p} F\_{p, n\_1 + n\_2 - 1 - p} \text{ (iv)}$$

*where Fp, n*1+*n*2−1−*<sup>p</sup> denotes an F distribution with p and n*<sup>1</sup> + *n*<sup>2</sup> − 1 − *p degrees of freedom, or equivalently, w is distributed as a type-2 beta variable with the parameters (p* <sup>2</sup> *, (n*1+*n*2−1−*p)* <sup>2</sup> *). We reject the null hypothesis Ho if <sup>n</sup>*1+*n*2−1−*<sup>p</sup> p w* ≥ *Fp, n*1+*n*2−1−*p, α with*

$$\Pr\{F\_{p,n\_1+n\_2-1-p} \ge F\_{p,n+1+n\_2-1-p,\ a}\} = \alpha \tag{\text{v}\text{i}}$$

*at a given significance level α.*

**Theorem 6.3.4.** *Let <sup>w</sup> be as defined in Theorem 6.3.3. Then <sup>w</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> *<sup>w</sup> is a real scalar type-2 beta variable with the parameters (n*1+*n*2−1−*<sup>p</sup>* <sup>2</sup> *, <sup>p</sup>* <sup>2</sup> *); <sup>w</sup>*<sup>2</sup> <sup>=</sup> *<sup>w</sup>* <sup>1</sup>+*<sup>w</sup> is a real scalar type-1 beta variable with the parameters (<sup>p</sup>* <sup>2</sup> *, (n*1+*n*2−1−*p)* <sup>2</sup> *); <sup>w</sup>*<sup>3</sup> <sup>=</sup> <sup>1</sup> <sup>1</sup>+*<sup>w</sup> is a real scalar type-1 beta variable with the parameters (n*1+*n*2−1−*<sup>p</sup>* <sup>2</sup> *, <sup>p</sup>* 2 *).*

Those last results follow from the connections between real scalar type-1 and type-2 beta random variables. Results parallel to those appearing in *(i)* to *(vi)* and stated Theorems 6.3.1–6.3.4 can similarly be obtained for the complex case.

**Example 6.3.3.** Consider two independent populations whose respective distributions are *N*2*(μ(*1*), Σ*1*)* and *N*2*(μ(*2*j ), Σ*2*), Σj > O, j* = 1*,* 2*,* and samples of sizes *n*<sup>1</sup> = 4 and *n*<sup>2</sup> = 5 from these two populations, respectively. Let the population covariance matrices be identical with *Σ*<sup>1</sup> = *Σ*<sup>2</sup> = *Σ*, the common covariance matrix being unknown, and let the observed sample vectors from the first population, *Xj* ∼ *N*2*(μ(*1*), Σ),* be

$$X\_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix},\ X\_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix},\ X\_3 = \begin{bmatrix} 1 \\ 2 \end{bmatrix},\ X\_4 = \begin{bmatrix} -1 \\ -1 \end{bmatrix}.$$

Denoting the sample mean from the first population by *X*¯ and the sample sum of products matrix by *S*1, we have

$$
\bar{X} = \frac{1}{4} \begin{bmatrix} 0 \\ 2 \end{bmatrix} \text{ and } \ S\_{\mathcal{l}} = (\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})', \ \mathbf{X} = [X\_1, X\_2, X\_3, X\_4], \ \bar{\mathbf{X}} = [\bar{X}, \dots, \bar{X}], \bar{\mathbf{X}} = [\bar{X}, \dots, \bar{X}], \bar{\mathbf{X}} = [\bar{X}, \dots, \bar{X}]
$$

the observations on these quantities being the following:

$$\begin{aligned} \label{eq:G12.1} X\_1 - \bar{X} &= \begin{bmatrix} 1 \\ 0 \end{bmatrix} - \frac{1}{4} \begin{bmatrix} 0 \\ 2 \end{bmatrix} = \frac{1}{4} \begin{bmatrix} 4 \\ -2 \end{bmatrix}, \\\label{eq:G2.1} X\_2 - \bar{X} &= \frac{1}{4} \begin{bmatrix} -4 \\ 2 \end{bmatrix}, \quad X\_3 - \bar{X} = \frac{1}{4} \begin{bmatrix} 4 \\ 6 \end{bmatrix}, \quad X\_4 - \bar{X} = \frac{1}{4} \begin{bmatrix} -4 \\ -6 \end{bmatrix}; \\\end{aligned}$$

$$\begin{aligned} \textbf{X} - \bar{X} &= \frac{1}{4} \begin{bmatrix} 4 & -4 & 4 & -4 \\ -2 & 2 & 6 & -6 \end{bmatrix}, \quad S\_1 = (\textbf{X} - \bar{\textbf{X}})(\textbf{X} - \bar{\textbf{X}})' \\\end{aligned}$$

$$\begin{aligned} S\_1 &= \frac{1}{4^2} \begin{bmatrix} 64 & 32 \\ 32 & 80 \end{bmatrix}. \end{aligned}$$

Let the sample vectors from the second population denoted as *Y*1*,...,Y*<sup>5</sup> be

$$\begin{bmatrix} Y\_1 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}, \ Y\_2 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \ Y\_3 = \begin{bmatrix} 1 \\ -1 \end{bmatrix}, \ Y\_4 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \ Y\_5 = \begin{bmatrix} 2 \\ 1 \end{bmatrix}.$$

Then, the sample average and the deviation vectors are the following:

$$\begin{aligned} \bar{Y} &= \frac{1}{5}[Y\_1 + \dots + Y\_5] = \frac{1}{5} \begin{bmatrix} 5 \\ 2 \end{bmatrix}, \\\ Y\_1 - \bar{Y} &= \begin{bmatrix} 0 \\ 1 \end{bmatrix} - \frac{1}{5} \begin{bmatrix} 5 \\ 2 \end{bmatrix} = \frac{1}{5} \begin{bmatrix} -5 \\ 3 \end{bmatrix}, \ Y\_2 - \bar{Y} = \frac{1}{5} \begin{bmatrix} 0 \\ -2 \end{bmatrix}, \\\ Y\_3 - \bar{Y} &= \frac{1}{5} \begin{bmatrix} 0 \\ -7 \end{bmatrix}, \ Y\_4 - \bar{Y} = \frac{1}{5} \begin{bmatrix} 0 \\ 3 \end{bmatrix}, \ Y\_5 - \bar{Y} = \frac{1}{5} \begin{bmatrix} 5 \\ 3 \end{bmatrix}, \\\ \mathbf{Y} - \bar{\mathbf{Y}} &= \frac{1}{5} \begin{bmatrix} -5 & 0 & 0 & 0 & 5 \\ 3 & -2 & -7 & 3 & 3 \end{bmatrix}, \ S\_2 = (\mathbf{Y} - \bar{\mathbf{Y}})(\mathbf{Y} - \bar{\mathbf{Y}})'; \\\ S\_3 &= \frac{1}{5^2} \begin{bmatrix} 50 & 0 \\ 0 & 80 \end{bmatrix}; \ S = S\_1 + S\_2 = \frac{1}{16} \begin{bmatrix} 64 & 32 \\ 32 & 80 \end{bmatrix} + \frac{1}{25} \begin{bmatrix} 50 & 0 \\ 0 & 80 \end{bmatrix} \end{aligned}$$

$$\begin{split} S\_{2} &= \frac{1}{S^{2}} \begin{bmatrix} 50 & 0 \\ 0 & 80 \end{bmatrix}; \ S = S\_{1} + S\_{2} = \frac{1}{16} \begin{bmatrix} 64 & 32 \\ 32 & 80 \end{bmatrix} + \frac{1}{25} \begin{bmatrix} 50 & 0 \\ 0 & 80 \end{bmatrix} \\ &= \begin{bmatrix} 6.00 & 2.00 \\ 2.00 & 8.20 \end{bmatrix} \Rightarrow S^{-1} = \frac{\text{Cof}(S)}{|S|} \\ |S| &= 45.20; \ S^{-1} = \frac{1}{45.20} \begin{bmatrix} 8.20 & -2.00 \\ -2.00 & 6.00 \end{bmatrix}. \end{split}$$

Letting the null hypothesis be

$$H\_o: \mu\_{(1)} - \mu\_{(2)} = \delta = \begin{bmatrix} 1 \\ 1 \end{bmatrix},$$

$$
\bar{X} - \bar{Y} - \delta = \frac{1}{4} \begin{bmatrix} 0 \\ 2 \end{bmatrix} - \frac{1}{5} \begin{bmatrix} 5 \\ 2 \end{bmatrix} - \begin{bmatrix} 1 \\ 1 \end{bmatrix} = - \begin{bmatrix} 2.0 \\ 0.9 \end{bmatrix};\ n\_1 = 4, \ n\_2 = 5.1
$$

Thus, test statistic is *u* ∼ *Fp,n*1+*n*2−1−*<sup>p</sup>* where

$$\begin{split} u &= \frac{(n\_1 + n\_2 - 1 - p)}{p} \frac{(n\_1 + n\_2)}{n\_1 n\_2} (\bar{X} - \bar{Y} - \delta)' S^{-1} (\bar{X} - \bar{Y} - \delta) \\ &= \frac{(4 + 5 - 1 - 2)}{2} \frac{(4 + 5)}{(4)(5)} \frac{1}{45.2} \begin{bmatrix} -2.0 & -0.9 \end{bmatrix} \begin{bmatrix} 8.2 & -2.0 \\ -2.0 & 6.0 \end{bmatrix} \begin{bmatrix} -2.0 \\ -0.9 \end{bmatrix} \\ &\approx 0.91. \end{split}$$

Let us test *Ho* at the 5% significance level. Since the required critical value is *Fp, n*1+*n*2−1−*p, α* = *F*2*,* <sup>6</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 5*.*14 and 0*.*91 *<* 5*.*14, the null hypothesis is not rejected.

# **6.3.4. Testing** *μ*<sup>1</sup> =···= *μp* **when** *Σ* **is unknown in the real Gaussian case**

Let the *p* × 1 vector *Xj* ∼ *Np(μ, Σ), Σ > O, j* = 1*, . . . , n,* the *Xj* 's being independently distributed. Let the *p* × 1 vector of unities be denoted by *J* or *J* - = *(*1*,...,* 1*),* and let *A* be a vector that is orthogonal to *J* so that *A*- *J* = 0. For example, we can take

$$A = \begin{bmatrix} 1 \\ -1 \\ \vdots \\ 1 \\ -1 \end{bmatrix} \text{ when } p \text{ is even, } A = \begin{bmatrix} 1 \\ -1 \\ \vdots \\ 1 \\ -2 \\ 1 \end{bmatrix} \text{ or } A = \begin{bmatrix} 1 \\ -1 \\ \vdots \\ -1 \\ 0 \end{bmatrix} \text{ when } p \text{ is odd.}$$

If the last component of *A* is zero, we are then ignoring the last component of *Xj* . Let *yj* = *A*- *Xj , j* = 1*, . . . , n,* and the *yj* 's be independently distributed. Then *yj* ∼ *N*1*(A*- *μ, A*- *ΣA), A*- *ΣA > O,* is a univariate normal variable with mean value *A*- *μ* and variance *A*- *ΣA*. Consider the *p* × *n* sample matrix comprising the *Xj* 's, that is, **<sup>X</sup>** <sup>=</sup> *(X*1*,...,Xn)*. Let the sample average of the *Xj* 's be *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn)* and **X**¯ = *(X,... ,* ¯ *X)*¯ . Then, the sample sum of products matrix *S* = *(***X**−**X**¯ *)(***X**−**X**¯ *)*- . Consider the 1 × *n* vector *Y* = *(y*1*,...,yn)* = *(A*- *X*1*,...,A*- *Xn)* = *A*- **<sup>X</sup>***, <sup>y</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(y*<sup>1</sup> +···+ *yn)* = *A*- *X*¯ , *<sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(yj* − ¯*y)*<sup>2</sup> <sup>=</sup> *<sup>A</sup>*- *(***X** − **X**¯ *)(***X** − **X**¯ *)*- *A* = *A*- *SA*. Let the null hypothesis be *Ho* : *μ*<sup>1</sup> = ··· = *μp* = *ν,* where *ν* is unknown, *μ*- = *(μ*1*,...,μp)*. Thus, *Ho* is *A*- *μ* = *νA*- *J* = 0. The joint density of *y*1*,...,yn*, denoted by *L*, is then

Hypothesis Testing and Null Distributions

$$\begin{split} L &= \prod\_{j=1}^{n} \frac{\mathbf{e}^{-\frac{1}{2A'\Sigma A} (\mathbf{y}\_j - A' \mu)^2}}{(2\pi)^{\frac{1}{2}} [A' \Sigma A]^{\frac{1}{2}}} = \frac{\mathbf{e}^{-\frac{1}{2(A'\Sigma A)} \sum\_{j=1}^{n} (\mathbf{y}\_j - A' \mu)^2}}{(2\pi)^{\frac{n}{2}} [A' \Sigma A]^{\frac{n}{2}}} \\ &= \frac{\mathbf{e}^{-\frac{1}{2A'\Sigma A} [A'SA + nA'(\bar{X} - \mu)(\bar{X} - \mu)'A]}}{(2\pi)^{\frac{n}{2}} [A' \Sigma A]^{\frac{n}{2}}} \\ \end{split} \tag{i)$$

where

$$\begin{split} \sum\_{j=1}^{n} (\mathbf{y}\_{j} - A^{\prime}\boldsymbol{\mu})^{2} &= \sum\_{j=1}^{n} (\mathbf{y}\_{j} - \bar{\mathbf{y}} + \bar{\mathbf{y}} - A^{\prime}\boldsymbol{\mu})^{2} = \sum\_{j=1}^{n} (\mathbf{y}\_{j} - \bar{\mathbf{y}})^{2} + n(\bar{\mathbf{y}} - A^{\prime}\boldsymbol{\mu})^{2} \\ &= \sum\_{j=1}^{n} A^{\prime} (\boldsymbol{X}\_{j} - \bar{\mathbf{X}})(\boldsymbol{X}\_{j} - \bar{\mathbf{X}})^{\prime} \boldsymbol{A} + nA^{\prime}(\bar{\mathbf{X}} - \boldsymbol{\mu})(\bar{\mathbf{X}} - \boldsymbol{\mu})^{\prime} \boldsymbol{A} \\ &= A^{\prime} \boldsymbol{S} \boldsymbol{A} + nA^{\prime}(\bar{\mathbf{X}} - \boldsymbol{\mu})(\bar{\mathbf{X}} - \boldsymbol{\mu})^{\prime} \boldsymbol{A}. \end{split}$$

Let us determine the MLE's of *μ* and *Σ*. We have

$$\ln L = -\frac{n}{2}(2\pi) - \frac{n}{2}\ln A'\Sigma A - \frac{1}{2A'\Sigma A}[A'SA + n(A'(\bar{X} - \mu)(\bar{X} - \mu)'A)].$$

On differentiating ln*L* with respect to *μ*<sup>1</sup> and equating the result to zero, we have

$$\begin{aligned} \frac{\partial}{\partial \mu\_1} \ln L &= 0 \Rightarrow nA'[\frac{\partial}{\partial \mu\_1} \{\bar{X}\bar{X}' - \bar{X}\mu' - \mu \bar{X}' + \mu \mu'\}]A = O\\ \Rightarrow nA'[-\bar{X}[1,0,\dots,0] - \begin{bmatrix} 1\\0\\\vdots\\0 \end{bmatrix} \bar{X}' + \begin{bmatrix} 1\\0\\\vdots\\0 \end{bmatrix} \mu' + \mu[1,0,\dots,0]]A = O\\ \Rightarrow 2a\_1A'(\bar{X} - \mu) = 0 \Rightarrow \hat{\mu} = \bar{X} \end{aligned} \tag{ii}$$

since the equation holds for each *μj , j* = 1*,... , p,* and *A*- = *(a*1*,...,ap), aj* = 0*, j* = 1*,... , p, A* being fixed. As well, *(X*¯ − *μ)(X*¯ − *μ)*- = *X*¯ *X*¯ - − *Xμ*¯ - − *μX*¯ - + *μμ*- . Now, consider differentiating ln*L* with respect to an element of *Σ*, say, *σ*11, at *μ*ˆ = *X*¯ :

$$\begin{aligned} \frac{\partial}{\partial \sigma\_{11}} \ln L &= 0\\ \Rightarrow -\frac{2n \, a\_1^2 \sigma\_{11}}{2A' \Sigma A} + \frac{A'SA}{2(A'\Sigma A)^2} (2a\_1^2 \sigma\_{11}) &= 0\\ \Rightarrow A'\hat{\Sigma}A &= \frac{1}{n}A'SA \end{aligned}$$

for each element in *<sup>Σ</sup>* and hence *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S*. Thus,

$$\max\_{\Omega} L = \frac{\mathbf{e}^{-\frac{n}{2}} n^{\frac{n}{2}}}{(A'SA)^{\frac{n}{2}}}.\tag{iiii}$$

Under *Ho, A*- *μ* = 0 and consequently the maximum under *Ho* is the following:

$$\max\_{H\_o} L = \frac{\mathbf{e}^{-\frac{n}{2}} n^{\frac{n}{2}}}{[A'(S + n\bar{X}\bar{X}')A]^{\frac{n}{2}}}.\tag{i\nu}$$

Accordingly, the *λ*-criterion is

$$\lambda = \frac{(A^\prime S A)^{\frac{n}{2}}}{[A^\prime (S + n\bar{X}\bar{X}^\prime)A]^{\frac{n}{2}}} = \frac{1}{[1 + n\frac{A^\prime \bar{X}\bar{X}^\prime A}{A^\prime S A}]^{\frac{n}{2}}}.\tag{6.3.10}$$

We would reject *Ho* for small values of *λ* or for large values of *u* ≡ *nA*- *X*¯ *X*¯ - *A/A*- *SA* where *X*¯ and *S* are independently distributed. Observing that *S* ∼ *Wp(n*−1*, Σ), Σ > O* and *<sup>X</sup>*¯ <sup>∼</sup> *Np(μ,* <sup>1</sup> *<sup>n</sup>Σ), Σ > O*, we have

$$\frac{n}{A'\Sigma A}A'\bar{X}\bar{X}'A \sim \chi\_1^2 \text{ and } \frac{A'SA}{A'\Sigma A} \sim \chi\_{n-1}^2.$$

Hence, *(n*−1*)u* is a *F* statistic with 1 and *n*−1 degrees of freedom, and the null hypothesis is to be rejected whenever

$$v \equiv n(n-1)\frac{A'\bar{X}\bar{X}'A}{A'SA} \ge F\_{1,n-1,\ a}, \text{ with } Pr\{F\_{1,n-1} \ge F\_{1,n-1,\ a}\} = \alpha. \tag{6.3.11}$$

**Example 6.3.4.** Consider a real bivariate Gaussian *N*2*(μ, Σ)* population where *Σ > O* is unknown. We would like to test the hypothesis *Ho* : *μ*<sup>1</sup> = *μ*2*, μ*- = *(μ*1*, μ*2*),* so that *μ*<sup>1</sup> − *μ*<sup>2</sup> = 0 under this null hypothesis. Let the sample be *X*1*, X*2*, X*3*, X*4*,* as specified in Example 6.3.3. Let *A*- = *(*1*,* −1*)* so that *A*- *μ* = *O* under *Ho*. With the same observation vectors as those comprising the first sample in Example 6.3.3, *A*- *X*<sup>1</sup> = *(*1*), A*- *X*<sup>2</sup> = *(*−2*), A*- *X*<sup>3</sup> = *(*−1*), A*- *X*<sup>4</sup> = *(*0*)*. Letting *y* = *A*- *Xj* , the observations on *yj* are *(*1*,* −2*,* −1*,* 0*)* or *A*- **X** = *A*- [*X*1*, X*2*, X*3*, X*4]=[1*,* −2*,* −1*,* 0]. The sample sum of products matrix as evaluated in the first part of Example 6.3.3 is

$$S\_1 = \frac{1}{16} \begin{bmatrix} 64 & 32 \\ 32 & 80 \end{bmatrix} \Rightarrow A'S\_1A = \frac{1}{16} \begin{bmatrix} 1 & -1 \end{bmatrix} \begin{bmatrix} 64 & 32 \\ 32 & 80 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \frac{80}{16} = 5.1$$

Our test statistic is

$$v = n(n-1)\frac{A'XX'A}{A'S\_1A} \sim F\_{1,n-1}, \ n = 4.$$

Let the significance level be *α* = 0*.*05. the observed values of *A*- *X*¯ *X*¯ - *A, A*- *S*1*A*, *v*, and the tabulated critical value of *F*1*, n*−1*, α* are the following:

$$\begin{aligned} A' \bar{X} \bar{X}'A &= \frac{1}{4}; \ A' S\_1 A = \frac{80}{16} = \mathfrak{F};\\ v &= 4(3) \left(\frac{1}{5 \times 4}\right) = 0.6; \ F\_{1, \, n-1, \, \alpha} = F\_{1,3,\, 0.05} = 10.13. \end{aligned}$$

As 0*.*6 *<* 10*.*13, *Ho* is not rejected.

# **6.3.5. Likelihood ratio test for testing** *Ho* : *μ*<sup>1</sup> =···= *μp* **when** *Σ* **is unknown**

In the entire parameter space of a *Np(μ, Σ)* population, *μ* is estimated by the sample average *X*¯ and, as previously determined, the maximum of the likelihood function is

$$\max\_{\Omega} L = \frac{\mathbf{e}^{-\frac{np}{2}} n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} |S|^{\frac{n}{2}}} \tag{i}$$

where *S* is the sample sum of products matrix and *n* is the sample size. Under the hypothesis *Ho* : *<sup>μ</sup>*<sup>1</sup> =···= *μp* <sup>=</sup> *<sup>ν</sup>*, where *<sup>ν</sup>* is unknown, this *<sup>ν</sup>* is estimated by *<sup>ν</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *np i,j xij* = 1 *p J* - *X, J* ¯ - = *(*1*,...,* 1*)*, the *p* × 1 sample vectors *X*- *<sup>j</sup>* = *(x*1*<sup>j</sup> ,...,xpj ), j* = 1*,...,n*, being independently distributed. Thus, under the null hypothesis *Ho*, the population covariance matrix is estimated by <sup>1</sup> *<sup>n</sup>(S* + *n(X*¯ − ˆ*μ)(X*¯ − ˆ*μ)*- *)*, and, proceeding as was done to obtain Eq. (6.3.3), the *λ*-criterion reduces to

$$\lambda = \frac{|S|^{\frac{n}{2}}}{|S + \mu(\bar{X} - \hat{\mu})(\bar{X} - \hat{\mu})'|^{\frac{n}{2}}} \tag{6.3.12}$$

$$=\frac{1}{(1+\mu)^{\frac{n}{2}}},\ \mu=n(\bar{X}-\hat{\mu})'S^{-1}(\bar{X}-\hat{\mu}).\tag{6.3.13}$$

Given the structure of *u* in (6.3.13), we can take the Gaussian population covariance matrix *Σ* to be the identity matrix *I* , as was explained in Sect. 6.3.1. Observe that

$$(\bar{X} - \hat{\mu})' = (\bar{X} - \frac{1}{p}JJ^\prime \bar{X})' = \bar{X}'[I - \frac{1}{p}JJ^\prime] \tag{ii}$$

where *<sup>I</sup>* <sup>−</sup> <sup>1</sup> *p J J* is idempotent of rank *p* − 1; hence there exists an orthonormal matrix *P*, *P P*- = *I, P*- *P* = *I* , such that

$$\begin{aligned} I - \frac{1}{p} J J' &= P \begin{bmatrix} I\_{p-1} & O \\ O' & 0 \end{bmatrix} P' \Rightarrow \\ \sqrt{n} (\bar{X} - \hat{\mu})' &= \sqrt{n} \bar{X}' (I - \frac{1}{p} J J') = \sqrt{n} \bar{X}' P \begin{bmatrix} I\_{p-1} & O \\ O' & 0 \end{bmatrix} = [V'\_1, 0], \; V &= \sqrt{n} \bar{X}' P, \; V \end{aligned}$$

where *V*<sup>1</sup> is the subvector of the first *p* − 1 components of *V* . Then the quadratic form *u*, which is our test statistic, reduces to the following:

$$\begin{aligned} \boldsymbol{\mu} &= \boldsymbol{n} (\bar{\boldsymbol{X}} - \hat{\boldsymbol{\mu}})^{\prime} \boldsymbol{S}^{-1} (\bar{\boldsymbol{X}} - \hat{\boldsymbol{\mu}}) = [\boldsymbol{V}\_{1}^{\prime}, \boldsymbol{0}] \boldsymbol{S}^{-1} \begin{bmatrix} \boldsymbol{V}\_{1} \\ \boldsymbol{0} \end{bmatrix} = \boldsymbol{V}\_{1}^{\prime} \boldsymbol{S}^{11} \boldsymbol{V}\_{1}, \\ \boldsymbol{S}^{-1} &= \begin{bmatrix} \boldsymbol{S}^{11} & \boldsymbol{S}^{12} \\ \boldsymbol{S}^{21} & \boldsymbol{S}^{22} \end{bmatrix} .\end{aligned}$$

We note that the test statistic *u* has the same structure that of *u* in Theorem 6.3.1 with *p* replaced by *p* − 1. Accordingly, *u* = *n(X*¯ − ˆ*μ)*- *<sup>S</sup>*−1*(X*¯ − ˆ*μ)* is distributed as a real scalar type-2 beta variable with the parameters *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> and *<sup>n</sup>*−*(p*−1*)* <sup>2</sup> , so that *<sup>n</sup>*−*p*+<sup>1</sup> *<sup>p</sup>*−<sup>1</sup> *<sup>u</sup>* <sup>∼</sup> *Fp*−1*, n*−*p*+1. Thus, the test criterion consists of

$$\begin{aligned} \text{rejecting } H\_o \text{ if the observed value of } \frac{n-p+1}{p-1} \mu &\ge F\_{p-1, \, n-p+1, \, \alpha}, \\ \text{with } Pr\{F\_{p-1, \, n-p+1} \ge F\_{p-1, \, n-p+1, \, \alpha}\} &= \alpha. \end{aligned} \tag{6.3.14}$$

**Example 6.3.5.** Let the population be *N*2*(μ, Σ), Σ > O, μ*- = *(μ*1*, μ*2*)* and the null hypothesis be *Ho* : *μ*<sup>1</sup> = *μ*<sup>2</sup> = *ν* where *ν* and *Σ* are unknown. The sample values, as specified in Example 6.3.3, are

$$X\_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix},\ X\_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix},\ X\_3 = \begin{bmatrix} 1 \\ 2 \end{bmatrix},\ X\_4 = \begin{bmatrix} -1 \\ -1 \end{bmatrix} \Rightarrow \bar{X} = \frac{1}{4} \begin{bmatrix} 0 \\ 2 \end{bmatrix}.$$

The maximum likelihood estimate of *μ* under *Ho*, is

$$
\hat{\mu} = \frac{1}{p} J J' \bar{X}, \ J = \begin{bmatrix} 1 \\ 1 \end{bmatrix},
$$

and

$$(\bar{X} - \hat{\mu})' = \bar{X}'[I - \frac{1}{p}JJ'] = \bar{X}'[I - \frac{1}{2}\begin{pmatrix} 1 & 1\\ 1 & 1 \end{pmatrix}] = \frac{1}{4} \begin{bmatrix} -1 & 1 \end{bmatrix}.$$

As previously calculated, the sample sum of products matrix is

$$S\_1 = \frac{1}{4^2} \begin{bmatrix} 64 & 32 \\ 32 & 80 \end{bmatrix} \Rightarrow S\_1^{-1} = \frac{16}{4096} \begin{bmatrix} 80 & -32 \\ -32 & 64 \end{bmatrix} = \frac{1}{256} \begin{bmatrix} 80 & -32 \\ -32 & 64 \end{bmatrix};\ n = 4, \ p = 2.$$

The test statistic *v* and its observed value are

$$\begin{split} v &= \frac{(n-p+1)}{(p-1)} n (\bar{X} - \hat{\mu})' S\_1^{-1} (\bar{X} - \hat{\mu}) \sim F\_{p-1, n-p+1} = F\_{1,3} \\ &= \frac{(4-2+1)}{(2-1)} (4) \frac{1}{4^2} \frac{1}{256} \begin{bmatrix} -1 & 1 \end{bmatrix} \begin{bmatrix} 80 & -32 \\ -32 & 64 \end{bmatrix} \begin{bmatrix} -1 \\ 1 \end{bmatrix} \\ &= \frac{(3)(4)}{(4^2)(256)} [(-1)^2 (80) + (1)^2 (64) - 2(32)(-1)(1)] \\ &= 0.61. \end{split}$$

At significance level *α* = 0*.*05, the tabulated critical value *F*1*,* <sup>3</sup>*,* <sup>0</sup>*.*<sup>05</sup> is 10*.*13, and since the observed value 0*.*61 is less than 10*.*13, *Ho* is not rejected.

#### **6.4. Testing Hypotheses on the Population Covariance Matrix**

Let the *p*×1 independent vectors *Xj , j* = 1*, . . . , n,* have a *p*-variate real nonsingular *Np(μ, Σ)* distribution and the *p* × *n* matrix **X** = *(X*1*,...,Xn)* be the sample matrix. Denoting the sample average by *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+*Xn)* and letting **X**¯ = *(X,... ,* ¯ *X)*¯ , each column of **X**¯ being equal to *X*¯ , the sample sum of products matrix is *S* = *(***X**−**X**¯ *)(***X**−**X**¯ *)*- . We have already established that *S* is Wishart distributed with *m* = *n* − 1 degrees of freedom, that is, *S* ∼ *Wp(m, Σ), Σ > O*. Letting *Sμ* = *(***X** − **M***)(***X** − **M***)* where **M** = *(μ, . . . , μ)*, each of its column being the *p* × 1 vector *μ*, *Sμ* ∼ *Wp(n, Σ), Σ > O*, where the number of degrees of freedom is *n* itself whereas the number of degrees of freedom associated with *S* is *m* = *n* − 1. Let us consider the hypothesis *Ho* : *Σ* = *Σo* where *Σo* is a given known matrix and *μ* is unspecified. Then, the MLE's of *μ* and *Σ* in the entire parameter space are *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>X</sup>*¯ and *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S*, and the joint density of the sample values *X*1*,...,Xn,* denoted by *L*, is given by

$$L = \frac{1}{(2\pi)^{\frac{np}{2}} |\boldsymbol{\Sigma}|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\boldsymbol{\Sigma}^{-1}\boldsymbol{S}) - \frac{n}{2} \text{tr}(\bar{\boldsymbol{X}} - \boldsymbol{\mu})(\bar{\boldsymbol{X}} - \boldsymbol{\mu})'}. \tag{6.4.1}$$

Thus, as previously determined, the maximum of *L* in the parameter space = {*(μ, Σ)*|*Σ>O*} is

$$\max\_{\Omega} L = \frac{n^{\frac{np}{2}} \mathbf{e}^{-\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} |S|^{\frac{n}{2}}},\tag{i}$$

the maximum of *L* under the null hypothesis *Ho* : *Σ* = *Σo* being given by

$$\max\_{H\_o} L = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\boldsymbol{\Sigma}\_o^{-1}\boldsymbol{S})}}{(2\pi)^{\frac{np}{2}}|\boldsymbol{\Sigma}\_o|^{\frac{n}{2}}}\,. \tag{ii}$$

Then, the *λ*-criterion is the following:

$$\lambda = \frac{\mathbf{e}^{\frac{np}{2}}}{n^{\frac{np}{2}}} |\boldsymbol{\Sigma}\_o^{-1}\boldsymbol{\mathcal{S}}|^{\frac{n}{2}} \mathbf{e}^{-\frac{1}{2}\text{tr}(\boldsymbol{\Sigma}\_o^{-1}\boldsymbol{\mathcal{S}})}.\tag{6.4.2}$$

Letting *u* = *λ* 2 *n* ,

$$\mu = \frac{\mathbf{e}^p}{n^p} \left| \Sigma\_o^{-1} S \right| \mathbf{e}^{-\frac{1}{n} \text{tr}(\Sigma\_o^{-1} S)},\tag{6.4.3}$$

and we would reject *Ho* for small values of *u* since it is a monotonically increasing function of *λ*, which means that the null hypothesis ought to be rejected for large values of tr*(Σ*−<sup>1</sup> *<sup>o</sup> S)* as the exponential function dominates the polynomial function for large values of the argument. Let us determine the distribution of *<sup>w</sup>* <sup>=</sup> tr*(Σ*−<sup>1</sup> *<sup>o</sup> S)* whose Laplace transform with parameter *s* is

$$L\_w(\mathbf{s}) = E[\mathbf{e}^{-sw}] = E[\mathbf{e}^{-s \operatorname{tr}(\Sigma\_o^{-1} \mathcal{S})}].\tag{iiii}$$

This can be evaluated by integrating out over the density of *S* which has a Wishart distribution with *m* = *n* − 1 degrees of freedom when *μ* is estimated:

$$L\_w(\mathbf{s}) = \frac{1}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})|\boldsymbol{\Sigma}|^{\frac{m}{2}}} \int\_{S > O} |\mathbf{S}|^{\frac{m}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\boldsymbol{\Sigma}^{-1}\mathbf{S}) - s \, \text{tr}(\boldsymbol{\Sigma}\_o^{-1}\mathbf{S})} \text{d}S. \tag{\text{iv}}$$

The exponential part is <sup>−</sup><sup>1</sup> <sup>2</sup> tr*(Σ*−1*S)* <sup>−</sup> *<sup>s</sup>* tr*(Σ*−<sup>1</sup> *<sup>o</sup> S)* = −<sup>1</sup> <sup>2</sup> tr[*(Σ*−<sup>1</sup> <sup>2</sup> *SΣ*−<sup>1</sup> <sup>2</sup> *)(I* + 2*sΣ* 1 <sup>2</sup>*Σ*−<sup>1</sup> *<sup>o</sup> Σ* 1 <sup>2</sup> *)*] and hence,

$$L\_w(\mathbf{s}) = |I + 2\mathbf{s}\Sigma^{\frac{1}{2}}\Sigma\_o^{-1}\Sigma^{\frac{1}{2}}|^{-\frac{m}{2}}.\tag{6.4.4}$$

**The null case,** *Σ* = *Σo*

In this case, *Σ* 1 <sup>2</sup>*Σ*−<sup>1</sup> *<sup>o</sup> Σ* 1 <sup>2</sup> = *I,* so that

$$L\_w(\mathbf{s}) = |I + 2\mathbf{s}I|^{-\frac{m}{2}} = (1 + 2\mathbf{s})^{-\frac{mp}{2}} \Rightarrow w \sim \chi^2\_{mp}.\tag{6.4.5}$$

Thus, the test criterion is the following:

$$\text{Reject } H\_o \text{ if } w \ge \chi^2\_{mp,\alpha}, \text{ with } Pr\{\chi^2\_{mp} \ge \chi^2\_{mp,\alpha}\} = \alpha. \tag{6.4.6}$$

When *μ* is known, it is used instead of its MLE to determine *Sμ*, and the resulting criterion consists of rejecting *Ho* whenever the observed *wμ* <sup>=</sup> tr*(Σ*−<sup>1</sup> *<sup>o</sup> Sμ)* <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *np, α* where *n* is the sample size. These results are summarized in the following theorem.

**Theorem 6.4.1.** *Let the null hypothesis be Ho* : *<sup>Σ</sup>* <sup>=</sup> *Σo (given) and <sup>w</sup>* <sup>=</sup> tr*(Σ*−<sup>1</sup> *<sup>o</sup> S) where <sup>S</sup> is the sample sum of products matrix. Then, the null distribution of <sup>w</sup>* <sup>=</sup> tr*(Σ*−<sup>1</sup> *<sup>o</sup> S)* *has a real scalar chisquare distribution with (n*−1*)p degrees of freedom when the estimate of μ, namely μ*ˆ = *X*¯ *, is utilized to compute S; when μ is specified, w has a chisquare distribution having np degrees of freedom where n is the sample size.*

#### **The non-null density of** *w*

The non-null density of *w* is available from (6.4.4). Let *λ*1*,...,λp* be the eigenvalues of *Σ* 1 <sup>2</sup>*Σ*−<sup>1</sup> *<sup>o</sup> Σ* 1 <sup>2</sup> . Then *Lw(s)* in (6.4.4) can be re-expressed as follows:

$$L\_w(\mathbf{s}) = \prod\_{j=1}^p (1 + 2\lambda\_j \mathbf{s})^{-\frac{m}{2}}.\tag{6.4.7}$$

This is the Laplace transform of a variable of the form *w* = *λ*1*w*<sup>1</sup> +···+ *λpwp* where *w*1*,...,wp* are independently distributed real scalar chisquare random variables, each having *m* = *n* − 1 degrees of freedom, where *λj >* 0*, j* = 1*,...,p*. The distribution of linear combinations of chisquare random variables corresponds to the distribution of quadratic forms; the reader may refer to Mathai and Provost (1992) for explicit representations of their density functions.

**Note 6.4.1.** If the population mean value *μ* is known, then one can proceed by making use of *μ* instead of the sample mean to determine *Sμ*, in which case *n*, the sample size, ought to be used instead of *m* = *n* − 1 in the above discussion.

#### **6.4.1. Arbitrary moments of** *λ*

From (6.4.2), the *h*-th moment of the *λ*-criterion for testing *Ho* : *Σ* = *Σo* (given) in a real nonsingular *Np(μ, Σ)* population, is obtained as follows:

$$\lambda^h = \frac{\mathbf{e}^{\frac{nph}{2}}}{n^{\frac{nph}{2}}} |\Sigma\_o^{-1}|^{\frac{nh}{2}} |\mathbf{S}|^{\frac{nh}{2}} \mathbf{e}^{-\frac{h}{2} \text{tr}(\Sigma\_o^{-1}\mathbf{S})} \implies$$

$$\begin{split} E[\lambda^{h}] &= \frac{\mathfrak{e}^{\frac{nh}{2}}}{n^{\frac{nh}{2}}2^{\frac{(n-1)p}{2}}|\Sigma\_{o}|^{\frac{nh}{2}}|\Sigma|^{\frac{n-1}{2}}\Gamma\_{p}(\frac{n-1}{2})} \int\_{S>O} |S|^{\frac{nh}{2}+\frac{n-1}{2}-\frac{p+1}{2}} \mathfrak{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}S)-\frac{b}{2}\mathfrak{tr}(\Sigma\_{o}^{-1}S)} \mathrm{d}S \\ &= \frac{\mathfrak{e}^{\frac{nh}{2}} 2^{p(\frac{nh}{2}+\frac{n-1}{2})}\Gamma\_{p}(\frac{nh}{2}+\frac{n-1}{2})}{n^{\frac{nh}{2}}2^{\frac{(n-1)p}{2}}|\Sigma\_{o}|^{\frac{nh}{2}}|\Sigma|^{\frac{n-1}{2}}\Gamma\_{p}(\frac{n-1}{2})} |\Sigma^{-1}+h\Sigma\_{o}^{-1}|^{-(\frac{nh}{2}+\frac{n-1}{2})} \\ &= \mathfrak{e}^{\frac{nh}{2}} \left(\frac{2}{n}\right)^{\frac{nh}{2}} \frac{|\Sigma|^{\frac{nh}{2}}}{|\Sigma\_{o}|^{\frac{nh}{2}}} \frac{\Gamma\_{p}(\frac{nh}{2}+\frac{n-1}{2})}{\Gamma\_{p}(\frac{n-1}{2})}|I+h\Sigma\,\Sigma\_{o}^{-1}|^{-(\frac{nh}{2}+\frac{n-1}{2})} \end{split} \tag{6.4.8}$$

for *(nh* <sup>2</sup> <sup>+</sup> *<sup>n</sup>*−<sup>1</sup> <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, I* <sup>+</sup> *hΣΣ*−<sup>1</sup> *<sup>o</sup> > O*. Under *Ho* : *<sup>Σ</sup>* <sup>=</sup> *Σo,* we have <sup>|</sup>*<sup>I</sup>* <sup>+</sup> *hΣΣ*−<sup>1</sup> *<sup>o</sup>* <sup>|</sup> <sup>−</sup>*( nh* <sup>2</sup> <sup>+</sup>*n*−<sup>1</sup> <sup>2</sup> *)* <sup>=</sup> *(*<sup>1</sup> <sup>+</sup> *h)*−*p( nh* <sup>2</sup> <sup>+</sup>*n*−<sup>1</sup> <sup>2</sup> *)* . Thus, the *h*-th null moment is given by

$$E[\lambda^h|H\_0] = \mathrm{e}^{\frac{nh}{2}} \left(\frac{2}{n}\right)^{\frac{nh}{2}} \frac{\Gamma\_p(\frac{nh}{2} + \frac{n-1}{2})}{\Gamma\_p(\frac{n-1}{2})} (1+h)^{-p(\frac{nh}{2} + \frac{n-1}{2})} \tag{6.4.9}$$

for *(nh* <sup>2</sup> <sup>+</sup> *<sup>n</sup>*−<sup>1</sup> <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> 2 .

# **6.4.2. The asymptotic distribution of** −2 ln *λ* **when testing** *Ho* : *Σ* = *Σo*

Let us determine the asymptotic distribution of −2 ln *λ* where *λ* is the likelihood ratio statistic for testing *Ho* : *Σ* = *Σo* (specified) in a real nonsingular *Np(μ, Σ)* population, as *n* → ∞, *n* being the sample size. This distribution can be determined by expanding both real matrix-variate gamma functions appearing in (6.4.9) and applying Stirling's approximation formula as given in (6.5.14) by letting *<sup>n</sup>* <sup>2</sup> *(*1 + *h)* → ∞ in the numerator gamma functions and *<sup>n</sup>* <sup>2</sup> → ∞ in the denominator gamma functions. Then, we have

$$\begin{split} \frac{\Gamma\_{p}(\frac{nh}{2}+\frac{n-1}{2})}{\Gamma\_{p}(\frac{n-1}{2})} &= \prod\_{j=1}^{p} \frac{\Gamma(\frac{n}{2}(1+h)-\frac{1}{2}-\frac{j-1}{2})}{\Gamma(\frac{n}{2}-\frac{1}{2}-\frac{j-1}{2})} \\ &\to \prod\_{j=1}^{p} \frac{(2\pi)^{\frac{1}{2}}}{(2\pi)^{\frac{1}{2}}} \frac{\left(\frac{n}{2}(1+h)\right)^{\frac{n}{2}(1+h)-\frac{1}{2}-\frac{j}{2}}}{\left(\frac{n}{2}\right)^{\frac{n}{2}-\frac{1}{2}-\frac{j}{2}}} \frac{\mathbf{e}^{-\frac{n}{2}(1+h)}}{\mathbf{e}^{-\frac{n}{2}}} \\ &= \left(\frac{n}{2}\right)^{\frac{nph}{2}} \mathbf{e}^{-\frac{nph}{2}} (1+h)^{\frac{np}{2}(1+h)-\frac{p}{2}-\frac{p(p+1)}{4}}. \end{split}$$

Hence, from (6.4.9)

$$E[\lambda^h|H\_o] \to (1+h)^{-\frac{p(p+1)}{4}} \text{ as } n \to \infty,\tag{6.4.10}$$

where *(*<sup>1</sup> <sup>+</sup> *h)*−*p(p*+1*)* <sup>4</sup> is the *<sup>h</sup>*-th moment of the distribution of e−*y/*<sup>2</sup> when *<sup>y</sup>* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *p(p*+1*)* 2 . Thus, under *Ho*, <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *p(p*+1*)* 2 as *n* → ∞. For general procedures leading to asymptotic normality, see Mathai (1982).

**Theorem 6.4.2.** *Letting λ be the likelihood ratio statistic for testing Ho* : *Σ* = *Σo (given) on the covariance matrix of a real nonsingular Np(μ, Σ) distribution, the null distribution of* −2 ln *λ is asymptotically (as then sample size tends to* ∞*) that of a real scalar chisquare random variable having p(p*+1*)* <sup>2</sup> *degrees of freedom, where n denotes the sample size. This number of degrees of freedom is also equal to the number of parameters restricted by the null hypothesis.*

**Note 6.4.2.** Sugiura and Nagao (1968) have shown that the test based on the statistic *λ* as specified in (6.4.2) is biased whereas it becomes unbiased upon replacing *n*, the sample size, by the degrees of freedom *n* − 1 in (6.4.2). Accordingly, percentage points are then computed for −2 ln *λ*1, where *λ*<sup>1</sup> is the statistic *λ* given in (6.4.2) wherein *n* − 1 is substituted to *n*. Korin (1968), Davis (1971), and Nagarsenker and Pillai (1973) computed 5% and 1% percentage points for this test statistic. Davis and Field (1971) evaluated the percentage points for *p* = 2*(*1*)*10 and *n* = 6*(*1*)*30*(*5*)*50*,* 60*,* 120 and Korin (1968), for *p* = 2*(*1*)*10.

**Example 6.4.1.** Let us take the same 3-variate real Gaussian population *N*3*(μ, Σ), Σ>O* and the same data as in Example 6.3.1, so that intermediate calculations could be utilized. The sample size is 5 and the sample values are the following:

$$X\_1 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \ X\_2 = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \ X\_3 = \begin{bmatrix} -1 \\ 1 \\ 2 \end{bmatrix}, \ X\_4 = \begin{bmatrix} -2 \\ 1 \\ 2 \end{bmatrix}, \ X\_5 = \begin{bmatrix} 2 \\ -1 \\ 0 \end{bmatrix},$$

the sample average and the sample sum of products matrix being

$$\bar{X} = \frac{1}{5} \begin{bmatrix} 1 \\ 2 \\ 4 \end{bmatrix}, \ S = \frac{1}{5^2} \begin{bmatrix} 270 & -110 & -170 \\ -110 & 80 & 85 \\ -170 & 85 & 170 \end{bmatrix}.$$

Let us consider the hypothesis *Σ* = *Σo* where

$$\begin{split} \Sigma\_{o} &= \begin{bmatrix} 2 & 0 & 0 \\ 0 & 3 & -1 \\ 0 & -1 & 2 \end{bmatrix} \Rightarrow |\Sigma\_{o}| = 10, \ \text{Cov}(\Sigma\_{o}) = \begin{bmatrix} 5 & 0 & 0 \\ 0 & 4 & 2 \\ 0 & 2 & 6 \end{bmatrix}; \\ \Sigma\_{o}^{-1} &= \frac{\text{Cov}(\Sigma\_{o})}{|\Sigma\_{o}|} = \frac{1}{10} \begin{bmatrix} 5 & 0 & 0 \\ 0 & 4 & 2 \\ 0 & 2 & 6 \end{bmatrix}; \\ \text{tr}(\Sigma\_{o}^{-1}S) &= \frac{1}{(10)(\Sigma^{2})} \text{tr} \left\{ \begin{bmatrix} 5 & 0 & 0 \\ 0 & 4 & 2 \\ 0 & 2 & 6 \end{bmatrix} \begin{bmatrix} 270 & -110 & -170 \\ -110 & 80 & 85 \\ -170 & 85 & 170 \end{bmatrix} \right\} \\ &= \frac{1}{(10)(\Sigma^{2})} [5(270) + (4(80) + 2(85)) + (2(85) + 6(170))] \\ &= 12.12; \ n = 5, \ p = 3. \end{split}$$

Let us test the null hypothesis at the significance level *α* = 0*.*05. The distribution of the test statistic *w* and the tabulated critical value are as follows:

$$w = \text{tr}(\Sigma\_o^{-1}\mathbb{S}) \sim \chi^2\_{(n-1)p} \simeq \chi^2\_{12}; \ \chi^2\_{12, \ 0.05} = 21.03... $$

As the observed value 12*.*12 *<* 21*.*03, *Ho* is not rejected. The asymptotic distribution of <sup>−</sup>2 ln *<sup>λ</sup>*, as *<sup>n</sup>* → ∞, is *<sup>χ</sup>*<sup>2</sup> *p(p*+1*)/*<sup>2</sup> *<sup>χ</sup>*<sup>2</sup> <sup>6</sup> where *λ* is the likelihood ratio criterion statistic. Since *χ*<sup>2</sup> <sup>6</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 12*.*59 and 12*.*59 *>* 12*.*12, we still do not reject *Ho* as *n* → ∞.

#### **6.4.3. Tests on Wilks' concept of generalized variance**

The concept of *generalized variance* was explained in Chap. 5. The sample generalized variance is simply the determinant of *S*, the sample sum of products matrix. When the population is *p*-variate Gaussian, it has already been shown in Chap. 5 that *S* is Wishart distributed with *m* = *n* − 1 degrees of freedom, *n* being the sample size, and parameter matrix *Σ>O*, which is the population covariance matrix. When the population is multivariate normal, several types of tests of hypotheses involve the sample generalized variance. The first author has given the exact distributions of such tests, see Mathai (1972a,b) and Mathai and Rathie (1971).

# **6.5. The Sphericity Test or Testing if** *Ho* : *<sup>Σ</sup>* <sup>=</sup> *<sup>σ</sup>*2*I,* **Given a** *Np(μ, Σ)* **Sample**

When the covariance matrix *<sup>Σ</sup>* <sup>=</sup> *<sup>σ</sup>*2*<sup>I</sup>* , where *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>gt;* 0 is a real scalar quantity, the ellipsoid *(X* − *μ)*- *<sup>Σ</sup>*−1*(X* <sup>−</sup> *μ)* <sup>=</sup> *c >* 0, which represents a specific contour of constant density for a nonsingular *Np(μ, Σ)* distribution, becomes the sphere defined by the equation <sup>1</sup> *<sup>σ</sup>*<sup>2</sup> *(X*−*μ)*- *(X*−*μ)* <sup>=</sup> *<sup>c</sup>* or <sup>1</sup> *<sup>σ</sup>*<sup>2</sup> *((x*1−*μ*1*)*2+···+*(xp*−*μp)*<sup>2</sup>*)* <sup>=</sup> *c >* <sup>0</sup>*,* whose center is located at the point *μ*; hence the test's name, the sphericity test. Given a *Np(μ, Σ)* sample of size *n*, the maximum of the likelihood function in the entire parameter space is

$$\sup\_{\Omega} L = \frac{n^{\frac{np}{2}} \mathbf{e}^{-\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} |S|^{\frac{n}{2}}},$$

as was previously established. However, under the null hypothesis *Ho* : *<sup>Σ</sup>* <sup>=</sup> *<sup>σ</sup>*2*<sup>I</sup>* , tr*(Σ*−1*S)* <sup>=</sup> *(σ*2*)*−1*(*tr*(S))* and <sup>|</sup>*Σ*| = *(σ*<sup>2</sup>*)p*. Thus, if we let *<sup>θ</sup>* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> and substitute *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>X</sup>*¯ in *<sup>L</sup>*, under *Ho* the loglikelihood function will be ln*Lω* = −*np* <sup>2</sup> ln*(*2*π )*<sup>−</sup> *np* <sup>2</sup> ln *<sup>θ</sup>* <sup>−</sup> <sup>1</sup> <sup>2</sup>*<sup>θ</sup>* tr*(S).* Differentiating this function with respect to *θ* and equating the result to zero produces the following estimator for *θ*:

$$
\hat{\theta} = \hat{\sigma}^2 = \frac{\text{tr}(\mathcal{S})}{np}. \tag{6.5.1}
$$

Accordingly, the maximum of the likelihood function under *Ho* is the following:

$$\max\_{\boldsymbol{\alpha}} L = \frac{n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}} \frac{\mathbf{e}^{-\frac{np}{2}}}{[\frac{\text{tr}(S)}{p}]^{\frac{n}{2}}}.$$

Thus, the *λ*-criterion for testing

$$H\_0: \Sigma = \sigma^2 I, \ \sigma^2 > 0 \text{ (unknown)}$$

is

$$\lambda = \frac{\sup\_{\boldsymbol{\alpha}} \boldsymbol{L}}{\sup\_{\Omega} \boldsymbol{L}} = \frac{|\boldsymbol{S}|^{\frac{\boldsymbol{\alpha}}{2}}}{[\frac{\text{tr}(\boldsymbol{S})}{p}]^{\frac{\boldsymbol{\alpha}}{2}}}.\tag{6.5.2}$$

In the complex Gaussian case when *X*˜ *<sup>j</sup>* ∼ *N*˜*p(μ, Σ), Σ* ˜ = *Σ*<sup>∗</sup> *> O* where an asterisk indicates the conjugate transpose, *X*˜ *<sup>j</sup>* = *Xj*<sup>1</sup> + *iXj*<sup>2</sup> where *Xj*<sup>1</sup> and *Xj*<sup>2</sup> are real *p* × 1 vectors and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. The covariance matrix associated with *<sup>X</sup>*˜ *<sup>j</sup>* is then defined as

$$\begin{split} \text{Cov}(\tilde{X}\_{j}) &= E[(\tilde{X}\_{j} - \tilde{\mu})(\tilde{X}\_{j} - \tilde{\mu})^{\*}] \\ &= E[(X\_{j1} - \mu\_{\text{(1)}}) + i(X\_{j2} - \mu\_{\text{(2)}})] [(X\_{j1} - \mu\_{\text{(1)}})' - i(X\_{j2} - \mu\_{\text{(2)}})'] \\ &= \Sigma\_{\text{II}} + \Sigma\_{22} + i(\Sigma\_{21} - \Sigma\_{12}) \equiv \Sigma, \text{ with } \mu = \mu\_{\text{(I)}} + i\mu\_{\text{(2)}}, \end{split}$$

where *Σ* is assumed to be Hermitian positive definite, with *Σ*<sup>11</sup> = Cov*(Xj*1*), Σ*<sup>22</sup> = Cov*(Xj*2*), Σ*<sup>12</sup> = Cov*(Xj*1*, Xj*2*)* and *Σ*<sup>21</sup> = Cov*(Xj*2*, Xj*1*)*. Thus, the hypothesis of sphericity in the complex Gaussian case is *<sup>Σ</sup>* <sup>=</sup> *<sup>σ</sup>*2*<sup>I</sup>* where *<sup>σ</sup>* is real and positive. Then, under the null hypothesis *<sup>H</sup>*˜*<sup>o</sup>* : *<sup>Σ</sup>* <sup>=</sup> *<sup>σ</sup>*2*<sup>I</sup>* , the Hermitian form *<sup>Y</sup>*˜ <sup>∗</sup>*ΣY*˜ <sup>=</sup> *c >* 0 where *<sup>c</sup>* is real and positive, becomes *<sup>σ</sup>*2*Y*˜ <sup>∗</sup>*Y*˜ <sup>=</sup> *<sup>c</sup>* ⇒ |˜*y*1<sup>|</sup> <sup>2</sup> +···+|˜*yp*<sup>|</sup> <sup>2</sup> <sup>=</sup> *<sup>c</sup> <sup>σ</sup>*<sup>2</sup> *>* 0, which defines a sphere in the complex space, where | ˜*yj* | denotes the absolute value or modulus of *y*˜*<sup>j</sup>* . If *<sup>y</sup>*˜*<sup>j</sup>* <sup>=</sup> *yj*<sup>1</sup> <sup>+</sup> *iyj*<sup>2</sup> with *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*), yj*1*, yj*<sup>2</sup> being real, then | ˜*yj* <sup>|</sup> <sup>2</sup> <sup>=</sup> *<sup>y</sup>*<sup>2</sup> *<sup>j</sup>*<sup>1</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> *j*2.

The joint density of the sample values in the real Gaussian case is the following:

$$L = \prod\_{j=1}^{n} \frac{\mathbf{e}^{-\frac{1}{2}(X\_j - \mu)'\Sigma^{-1}(X\_j - \mu)}}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}S) - n(\bar{X} - \mu)'\Sigma^{-1}(\bar{X} - \mu)}}{(2\pi)^{\frac{np}{2}}|\Sigma|^{\frac{n}{2}}}$$

where *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn)*, *Xj , j* = 1*,...,n* are iid *Np(μ, Σ), Σ > O*. We have already derived the maximum of *L* in the entire parameter space , which, in the real case, is

$$\sup\_{\Omega} L = \frac{\mathbf{e}^{-\frac{np}{2}} n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} |\mathcal{S}|^{\frac{n}{2}}},\tag{6.5.3}$$

where *S* is the sample sum of products matrix. Under *Ho*, |*Σ*| *n* <sup>2</sup> <sup>=</sup> *(σ*2*) np* <sup>2</sup> and tr*(Σ*−1*S)* <sup>=</sup> <sup>1</sup> *<sup>σ</sup>*<sup>2</sup> *(s*<sup>11</sup> +···+ *spp)* <sup>=</sup> <sup>1</sup> *<sup>σ</sup>*<sup>2</sup> tr*(S)*. Thus, the maximum likelihood estimator of *<sup>σ</sup>*<sup>2</sup> is <sup>1</sup> *np* tr*(S)*. Accordingly, the *λ*-criterion is

$$\lambda = |S|^{\frac{n}{2}} / \left(\frac{\text{tr}(S)}{p}\right)^{\frac{np}{2}} \Rightarrow \mu\_1 = \lambda^{\frac{2}{n}} = \frac{p^p |S|}{[\text{tr}(S)]^p},\tag{6.5.4}$$

in the real case. Interestingly, *(u*1*)*1*/p* is the ratio of the geometric mean of the eigenvalues of *S* to their arithmetic mean. The structure remains the same in the complex domain, in which case det*(S)*˜ is replaced by the absolute value |det*(S)*˜ | so that

$$\tilde{\mu}\_1 = \frac{p^p |\det(\tilde{S})|}{[\text{tr}(\tilde{S})]^p}. \tag{6.5a.1}$$

For arbitrary *h*, the *h*-th moment of *u*<sup>1</sup> in the real case can be obtained by integrating out over the density of *S*, which, as explained in Sect. 5.5, 5.5a, is a Wishart density with *n* − 1 = *m* degrees of freedom and parameter matrix *Σ>O*. However, when the null hypothesis *Ho* holds, *<sup>Σ</sup>* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup>*Ip*, so that the *<sup>h</sup>*-th moment in the real case is

$$E[\boldsymbol{u}\_1^h | \boldsymbol{H}\_o] = \frac{p^{ph}}{2^{\frac{mp}{2}} \Gamma\_p(\frac{m}{2}) (\sigma^2)^{\frac{mp}{2}}} \int\_{S > O} |\boldsymbol{S}|^{\frac{m}{2} + h - \frac{p+1}{2}} \mathrm{e}^{-\frac{1}{2\sigma^2} \mathrm{tr}(\boldsymbol{S})}(\mathrm{tr}(\boldsymbol{S}))^{-ph} \mathrm{d}\mathcal{S}. \tag{i}$$

In order to evaluate this integral, we replace [tr*(S)*] <sup>−</sup>*ph* by an equivalent integral:

$$\mathbb{E}\left[\text{tr}(S)\right]^{-ph} = \frac{1}{\Gamma(ph)} \int\_{x=0}^{\infty} x^{ph-1} e^{-x\text{tr}(S)} dx,\ \Re(h) > 0. \tag{ii}$$

Then, substituting *(ii)* in *(i)*, the exponent becomes <sup>−</sup> <sup>1</sup> <sup>2</sup>*σ*<sup>2</sup> *(*<sup>1</sup> <sup>+</sup> <sup>2</sup>*σ*2*x)(*tr*(S))*. Now, letting *<sup>S</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup>*σ*<sup>2</sup> *(*<sup>1</sup> <sup>+</sup> <sup>2</sup>*σ*2*x)S* <sup>⇒</sup> <sup>d</sup>*<sup>S</sup>* <sup>=</sup> *(*2*σ*2*) p(p*+1*)* <sup>2</sup> *(*<sup>1</sup> <sup>+</sup> <sup>2</sup>*σ*2*x)*−*p(p*+1*)* <sup>2</sup> d*S*1, and we have

$$E[u\_1^h|H\_0] = \frac{(2\sigma^2)^{ph}}{\Gamma\_p(\frac{m}{2})} \frac{p^{ph}}{\Gamma(ph)} \int\_0^\infty x^{ph-1} (1+2\sigma^2 x)^{-(\frac{m}{2}+h)p} \mathrm{d}x$$

$$\begin{split} \qquad \times \int\_{S\_1 > O} |S\_1|^{\frac{m}{2}+h-\frac{p+1}{2}} \mathrm{e}^{-\mathrm{tr}(S\_1)} \mathrm{d}S\_1\\ = \frac{\Gamma\_p(\frac{m}{2}+h)}{\Gamma\_p(\frac{m}{2})} \frac{p^{ph}}{\Gamma(ph)} \int\_0^\infty \mathrm{y}^{ph-1} (1+\mathrm{y})^{-(\frac{m}{2}+h)p} \mathrm{d}\mathrm{y}, \ \mathrm{y} = 2\sigma^2 \mathrm{x} \\ = \frac{\Gamma\_p(\frac{m}{2}+h)}{\Gamma\_p(\frac{m}{2})} p^{ph} \frac{\Gamma(\frac{mp}{2})}{\Gamma(\frac{mp}{2}+ph)}, \ \mathrm{d}(h) > 0, \ m = n-1. \end{split} \tag{6.5.5}$$

The corresponding *h*-th moment in the complex case is the following:

$$E[\tilde{u}\_1^h|H\_o] = \frac{\Gamma\_p(m+h)}{\tilde{\Gamma}\_p(m)} p^{ph} \frac{\Gamma(mp)}{\tilde{\Gamma}(mp+ph)}, \ \Re(h) > 0, \ m = n-1. \tag{6.5a.2}$$

By making use of the multiplication formula for gamma functions, one can expand the real gamma function *Γ (mz)* as follows:

$$\Gamma(mz) = (2\pi)^{\frac{1-m}{2}} m^{mz-\frac{1}{2}} \Gamma(z) \Gamma(z+\frac{1}{m}) \cdots \Gamma(z+\frac{m-1}{m}), \ m = 1,2,\ldots,\quad(6.5.6)$$

and for *m* = 2*,* we have the duplication formula

$$
\Gamma(2z) = \pi^{-\frac{1}{2}} 2^{2z-1} \Gamma(z) \Gamma\left(z + \frac{1}{2}\right). \tag{6.5.7}
$$

Then on applying (6.5.6),

$$\frac{p^{ph}\Gamma(\frac{mp}{2})}{\Gamma(\frac{mp}{2} + ph)} = \frac{\Gamma(\frac{m}{2})}{\Gamma(\frac{m}{2} + h)} \prod\_{j=1}^{p-1} \frac{\Gamma(\frac{m}{2} + \frac{j}{p})}{\Gamma(\frac{m}{2} + h + \frac{j}{p})}.\tag{iii}$$

Moreover, it follows from the definition of the real matrix-variate gamma functions that

$$\frac{\Gamma\_p(\frac{m}{2} + h)}{\Gamma\_p(\frac{m}{2})} = \prod\_{j=1}^p \frac{\Gamma(\frac{m}{2} - \frac{j-1}{2} + h)}{\Gamma(\frac{m}{2} - \frac{j-1}{2})}.\tag{i\nu}$$

On canceling *Γ (<sup>m</sup>* <sup>2</sup> <sup>+</sup> *h)/Γ (<sup>m</sup>* <sup>2</sup> *)* when multiplying *(iii)* by *(iv)*, we are left with

$$E[u\_1^h|H\_o] = \left\{ \prod\_{j=1}^{p-1} \frac{\Gamma(\frac{m}{2} + \frac{j}{p})}{\Gamma(\frac{m}{2} - \frac{j}{2})} \right\} \left\{ \prod\_{j=1}^{p-1} \frac{\Gamma(\frac{m}{2} - \frac{j}{2} + h)}{\Gamma(\frac{m}{2} + \frac{j}{p} + h)} \right\}, \ m = n - 1. \tag{6.5.8}$$

The corresponding *h*-th moment in the complex case is the following:

$$E[\tilde{u}\_1^h|H\_o] = \left\{ \prod\_{j=1}^{p-1} \frac{\tilde{\Gamma}(m + \frac{j}{p})}{\tilde{\Gamma}(m - j)} \right\} \left\{ \prod\_{j=1}^{p-1} \frac{\tilde{\Gamma}(m - j + h)}{\tilde{\Gamma}(m + \frac{j}{p} + h)} \right\}, \\ m = n - 1. \tag{6.5a.3}$$

For *<sup>h</sup>* <sup>=</sup> *<sup>s</sup>* <sup>−</sup> 1, one can treat *<sup>E</sup>*[*us*−<sup>1</sup> <sup>1</sup> |*Ho*] as the Mellin transform of the density of *u*<sup>1</sup> in the real case. Letting this density be denoted by *fu*<sup>1</sup> *(u*1*)*, it can be expressed in terms of a G-function as follows:

$$f\_{\boldsymbol{u}\_{1}}(\boldsymbol{u}\_{1}|\boldsymbol{H}\_{o}) = c\_{1}G\_{p-1,p-1}^{p-1,0}\left[\boldsymbol{u}\_{1}\Big|\_{\frac{m}{2}-\frac{j}{2}-1,\ j=1,\ldots,p-1}^{\frac{m}{2}-1,\ j=1,\ldots,p-1}\right],\ 0 \le \boldsymbol{u}\_{1} \le 1,\tag{6.5.9}$$

and *fu*<sup>1</sup> *(u*1|*Ho)* = 0 elsewhere, where

$$c\_1 = \left\{ \prod\_{j=1}^{p-1} \frac{\Gamma(\frac{m}{2} + \frac{j}{p})}{\Gamma(\frac{m}{2} - \frac{j}{2})} \right\},$$

the corresponding density in the complex case being the following:

$$\tilde{f}\_{\tilde{\mu}\_1|H\_0}(\tilde{\mu}\_1) = \tilde{c}\_1 \tilde{G}\_{p-1, p-1}^{p-1, 0} \left[ \tilde{u}\_1 \Big|\_{m-j-1, \ j=1, \dots, p-1}^{m+\frac{j}{p}-1, \ j=1, \dots, p-1} \right], \ 0 \le |\tilde{\mu}\_2| \le 1,\tag{6.5a.4}$$

and *f*˜ *<sup>u</sup>*˜<sup>1</sup> *(u*˜1*)* = 0 elsewhere, where *G*˜ is a real G-function whose parameters are different from those appearing in (6.5.9), and

$$\tilde{c}\_1 = \left\{ \prod\_{j=1}^{p-1} \frac{\tilde{\Gamma}(m + \frac{j}{p})}{\tilde{\Gamma}(m - j)} \right\}.$$

For computable series representation of a G-function with general parameters, the reader may refer to Mathai (1970a, 1993). Observe that *u*<sup>1</sup> in the real case is structurally a product of *p* − 1 mutually independently distributed real scalar type-1 beta random variables with the parameters *(αj* <sup>=</sup> *<sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>* <sup>2</sup> *, βj* <sup>=</sup> *<sup>j</sup>* <sup>2</sup> <sup>+</sup> *<sup>j</sup> <sup>p</sup> ), j* = 1*,...,p* − 1. In the complex case, *u*˜<sup>1</sup> is structurally a product of *p* − 1 mutually independently distributed real scalar type-1 beta random variables with the parameters *(αj* <sup>=</sup> *<sup>m</sup>*−*j, βj* <sup>=</sup> *<sup>j</sup>* <sup>+</sup> *<sup>j</sup> <sup>p</sup> ), j* = 1*,...,p* −1. This observation is stated as a result.

**Theorem 6.5.1.** *Consider the sphericity test statistic for testing the hypothesis Ho* : *Σ* = *σ*2*I where σ*<sup>2</sup> *>* 0 *is an unknown real scalar. Let u*<sup>1</sup> *and the corresponding complex quantity u*˜<sup>1</sup> *be as defined in (6.5.4) and (6.5a.1) respectively. Then, in the real case, u*<sup>1</sup> *is structurally a product of p* − 1 *independently distributed real scalar type-1 beta random variables with the parameters (αj* <sup>=</sup> *<sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>* <sup>2</sup> *, βj* <sup>=</sup> *<sup>j</sup>* <sup>2</sup> <sup>+</sup> *<sup>j</sup> <sup>p</sup> ), j* = 1*,...,p* − 1*, and, in the complex case, u*˜<sup>1</sup> *is structurally a product of p* − 1 *independently distributed real scalar type-1 beta random variables with the parameters (αj* <sup>=</sup> *<sup>m</sup>* <sup>−</sup> *j, βj* <sup>=</sup> *<sup>j</sup>* <sup>+</sup> *<sup>j</sup> <sup>p</sup> ), j* = 1*,...,p* − 1*, where m* = *n* − 1*, n* = *the sample size.*

For certain special cases, one can represent (6.5.9) and (6.5*a*.4) in terms of known elementary functions. Some such cases are now being considered.

# **Real case:** *p* = 2

In the real case, for *p* = 2

$$E[\boldsymbol{u}\_1^h | \boldsymbol{H}\_o] = \frac{\Gamma(\frac{m}{2} + \frac{1}{2})}{\Gamma(\frac{m}{2} - \frac{1}{2})} \frac{\Gamma(\frac{m}{2} - \frac{1}{2} + h)}{\Gamma(\frac{m}{2} + \frac{1}{2} + h)} = \frac{\frac{m}{2} - \frac{1}{2}}{\frac{m}{2} - \frac{1}{2} + h}.$$

This means *<sup>u</sup>*<sup>1</sup> is a real type-1 beta variable with the parameters *(α* <sup>=</sup> *<sup>m</sup>* <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> *, β* = 1*)*. The corresponding result in the complex case is that *u*˜<sup>1</sup> is a real type-1 beta variable with the parameters *(α* = *m* − 1*, β* = 1*)*.

# **Real case:** *p* = 3

In the real case

$$E[u\_1^h|H\_o] = \frac{\Gamma(\frac{m}{2} + \frac{1}{3})\Gamma(\frac{m}{2} + \frac{2}{3})}{\Gamma(\frac{m}{2} - \frac{1}{2})\Gamma(\frac{m}{2} - 1)} \frac{\Gamma(\frac{m}{2} - \frac{1}{2} + h)\Gamma(\frac{m}{2} - 1 + h)}{\Gamma(\frac{m}{2} + \frac{1}{3} + h)\Gamma(\frac{m}{2} + \frac{2}{3} + h)},$$

so that *u*<sup>1</sup> is equivalent to the product of two independently distributed real type-1 beta random variables with the parameters *(αj , βj )* <sup>=</sup> *(<sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>* <sup>2</sup> *, <sup>j</sup>* <sup>2</sup> <sup>+</sup> *<sup>j</sup>* <sup>3</sup> *), j* = 1*,* 2. This density can be obtained by treating *<sup>E</sup>*[*u<sup>h</sup>* <sup>1</sup>|*Ho*] for *h* = *s* − 1 as the Mellin transform of the density of *u*1. The density is then available by taking the inverse Mellin transform. Thus, again denoting it by *fu*<sup>1</sup> *(u*1*)*, we have

$$\begin{split} f\_{\mu\_{1}}(\mu\_{1}|H\_{o}) &= c\_{3} \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} \phi\_{3}(s) \text{ds}, \ c > \frac{1}{2} \\ &= c\_{3} \Big[ \sum\_{\nu=0}^{\infty} \mathcal{R}\_{\nu} + \sum\_{\nu=0}^{\infty} \mathcal{R}\_{\nu}^{\prime} \Big], \\ c\_{3} &= \frac{\Gamma(\frac{m}{2} + \frac{1}{3}) \Gamma(\frac{m}{2} + \frac{2}{3})}{\Gamma(\frac{m}{2} - \frac{1}{2}) \Gamma(\frac{m}{2} - 1)}, \\ \phi\_{3}(s) &= \frac{\Gamma(\frac{m}{2} - \frac{1}{2} - 1 + s) \Gamma(\frac{m}{2} - 1 - 1 + s)}{\Gamma(\frac{m}{2} + \frac{1}{3} - 1 + s) \Gamma(\frac{m}{2} + \frac{2}{3} - 1 + s)} u\_{1}^{-s}, \end{split}$$

where *Rν* is the residue of the integrand *φ*3*(s)* at the poles of *Γ (<sup>m</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup> <sup>2</sup> + *s)* and *R*- *<sup>ν</sup>* is the residue of the integrand *φ*3*(s)* at the pole of *Γ (<sup>m</sup>* <sup>2</sup> <sup>−</sup> <sup>2</sup> <sup>+</sup> *s)*. Letting *<sup>s</sup>*<sup>1</sup> <sup>=</sup> *<sup>m</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup> <sup>2</sup> + *s*,

$$\begin{split} R\_{\boldsymbol{\nu}} &= \lim\_{\boldsymbol{s} \to -\boldsymbol{\nu} + \frac{3}{2} - \frac{m}{2}} \phi\_{\boldsymbol{3}}(\boldsymbol{s}) = \lim\_{\boldsymbol{s}\_{1} \to -\boldsymbol{\nu}} [(\boldsymbol{s}\_{1} + \boldsymbol{\nu}) \boldsymbol{u}\_{1}^{\frac{m}{2} - \frac{3}{2}} \frac{\Gamma(\boldsymbol{s}\_{1}) \Gamma(-(\frac{1}{2} + \boldsymbol{s}\_{1}) \boldsymbol{u}\_{1}^{-s\_{1}}}{\Gamma(\frac{1}{3} + \frac{1}{2} + \boldsymbol{s}\_{1}) \Gamma(\frac{2}{3} + \frac{1}{2} + \boldsymbol{s}\_{1})} \\ &= \boldsymbol{u}\_{1}^{\frac{m}{2} - \frac{3}{2}} \frac{(-1)^{\nu}}{\boldsymbol{\nu}!} \frac{\Gamma(-\frac{1}{2} - \boldsymbol{\nu})}{\Gamma(\frac{1}{3} + \frac{1}{2} - \boldsymbol{\nu}) \Gamma(\frac{2}{3} + \frac{1}{2} - \boldsymbol{\nu})} \boldsymbol{u}\_{1}^{\boldsymbol{v}}. \end{split}$$

We can replace negative *ν* in the arguments of the gamma functions with positive *ν* by making use of the following formula:

$$\Gamma(a-\nu) = \frac{(-1)^{\nu}\Gamma(a)}{(-a+1)\_{\nu}},\ a \neq 1,2,\ldots,\ \nu = 0,1,\ldots,\tag{6.5.10}$$

where for example, *(b)ν* is the Pochhammer symbol

$$(b)\_\upsilon = b(b+1)\cdots(b+\upsilon-1),\ b \neq 0,\ (b)\_0 = 1,\tag{6.5.11}$$

so that

$$\begin{aligned} \Gamma\left(-\frac{1}{2} - \nu\right) &= \frac{(-1)^{\nu} \Gamma\left(-\frac{1}{2}\right)}{(\frac{3}{2})\_{\nu}}, & \Gamma\left(\frac{1}{3} + \frac{1}{2} - \nu\right) &= \frac{(-1)^{\nu} \Gamma\left(\frac{1}{3} + \frac{1}{2}\right)}{(-\frac{1}{3} + \frac{1}{2})\_{\nu}}, \\ \Gamma\left(\frac{2}{3} + \frac{1}{2} - \nu\right) &= \frac{(-1)^{\nu} \Gamma\left(\frac{2}{3} + \frac{1}{2}\right)}{(-\frac{2}{3} + \frac{1}{2})\_{\nu}}. \end{aligned}$$

The sum of the residues then becomes

$$\sum\_{\nu=0}^{\infty} \mathcal{R}\_{\nu} = \frac{\Gamma(-\frac{1}{2})}{\Gamma(\frac{1}{3} + \frac{1}{2})\Gamma(\frac{2}{3} + \frac{1}{2})} u\_1^{\frac{m}{2} - \frac{3}{2}} \,\_2F\_1 \left( -\frac{1}{3} + \frac{1}{2}; -\frac{2}{3} + \frac{1}{2}; \frac{3}{2}; u\_1 \right), \ 0 \le u\_1 \le 1.$$

It can be similarly shown that

$$\sum\_{\nu=0}^{\infty} R'\_{\nu} = \frac{\Gamma(\frac{1}{2})}{\Gamma(\frac{1}{3} + 1)\Gamma(\frac{2}{3} + 1)} u\_1^{\frac{m}{2} - 2} \,\_2F\_1\left(-\frac{1}{3}, -\frac{2}{3}; \frac{1}{2}; u\_1\right), \; 0 \le u\_1 \le 1.$$

Accordingly, the density of *u*<sup>1</sup> for *p* = 3 is the following:

$$\begin{split} f\_1(u\_1|H\_o) &= c\_3 \Big\{ \frac{\Gamma(-\frac{1}{2})}{\Gamma(\frac{5}{6})\Gamma(\frac{7}{6})} u\_1^{\frac{m}{2}-\frac{3}{2}} \,\_2F\_1(\frac{1}{6}, -\frac{1}{6}; \frac{3}{2}; u\_1) \\ &+ \frac{\Gamma(\frac{1}{2})}{\Gamma(\frac{4}{3})\Gamma(\frac{5}{3})} u\_1^{\frac{m}{2}-2} \,\_2F\_1(-\frac{1}{3}, -\frac{2}{3}; \frac{1}{2}; u\_1) \Big\}, \ 0 \le u\_1 \le 1 \end{split} \tag{6.5.12}$$

and *fu*<sup>1</sup> *(u*1|*Ho)* = 0 elsewhere.

# **Real case:** *p* = 4

In this case,

$$E[u\_1^h|H\_o] = c\_4 \frac{\Gamma(\frac{m}{2} - \frac{3}{2} + s)\Gamma(\frac{m}{2} - 2 + s)\Gamma(\frac{m}{2} - \frac{5}{2} + s)}{\Gamma(\frac{m}{2} - \frac{3}{4} + s)\Gamma(\frac{m}{2} - \frac{2}{4} + s)\Gamma(\frac{m}{2} - \frac{1}{4} + s)},$$

where *c*<sup>4</sup> is the normalizing constant. However, noting that

$$\frac{\Gamma(\frac{m}{2} - \frac{3}{2} + s)}{\Gamma(\frac{m}{2} - \frac{1}{2} + s)} = \frac{1}{\frac{m}{2} - \frac{3}{2} + s},$$

there is one pole at *<sup>s</sup>* = −*<sup>m</sup>* <sup>2</sup> <sup>+</sup> <sup>3</sup> <sup>2</sup> . The poles of *Γ (<sup>m</sup>* <sup>2</sup> <sup>−</sup> <sup>5</sup> <sup>2</sup> <sup>+</sup> *s)* occur at *<sup>s</sup>* = −*<sup>m</sup>* 2 + 5 <sup>2</sup> − *ν, ν* = 0*,* 1*,...,* and hence at *ν* = 1*,* the pole coincides with the earlier pole and there is a pole of order 2 at *<sup>s</sup>* = −*<sup>m</sup>* <sup>2</sup> <sup>+</sup> <sup>3</sup> <sup>2</sup> . Each one of other poles of the integrand is simple, that is, of order 1. The second order pole will bring in a logarithmic function. As all the cases for which *p* ≥ 4 will bring in poles of higher orders, they will not be herein discussed. The general expansion of a G-function of the type *Gm,*<sup>0</sup> *m,m(*·*)* is provided in Mathai (1970a, 1993). In the complex case, starting from *p* ≥ 3, poles of higher orders are coming in, so that the densities can only be written in terms of logarithms, psi and zeta functions; hence, these will not be considered. Observe that *u*˜<sup>1</sup> corresponds to product of independently distributed real type-1 beta random variables, even though the densities are available only in terms of logarithms, psi and zeta functions for *p* ≥ 3. The null and nonnull densities of the *λ*-criterion in the general case, were derived by the first author and some results obtained under the null distribution can also be found in Mathai and Saxena (1973). Several researchers have contributed to various aspects of the sphericity and multisample sphericity tests; for some of the first author's contributions, the reader may refer to Mathai and Rathie (1970) and Mathai (1977, 1984, 1986).

Gamma products such as those appearing in (6.5.8) and (6.5*a*.3) are frequently encountered when considering various types of tests on the parameters of a real or complex Gaussian or certain other types of distributions. Structural representations in the form of product of independently distributed real scalar type-1 beta random variables occur in numerous situations. Thus, a general asymptotic result on the *h*-th moment of such products of type-1 beta random variables will be derived. This is now stated as a result.

**Theorem 6.5.2.** *Let u be a real scalar random variable whose h-th moment is of the form*

$$E[u^h] = \frac{\Gamma\_p(\alpha + \alpha h + \gamma)}{\Gamma\_p(\alpha + \gamma)} \frac{\Gamma\_p(\alpha + \gamma + \delta)}{\Gamma\_p(\alpha + \alpha h + \gamma + \delta)}\tag{6.5.13}$$

*where Γp(*·*) is a real matrix-variate gamma function on p* × *p real positive definite matrices, α is real, γ is bounded, δ is real,* 0 *<δ<* ∞ *and h is arbitrary. Then, as <sup>α</sup>* → ∞*,* <sup>−</sup>2 ln *<sup>u</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> <sup>2</sup> *p δ, a real chisquare random variable having* 2 *p δ degrees of freedom, that is, a real gamma random variable with the parameters (α* = *p δ, β* = 2*).*

*Proof***:** On expanding the real matrix-variate gamma functions, we have the following:

$$\frac{\Gamma\_p(\alpha+\nu+\delta)}{\Gamma\_p(\alpha+\nu)} = \prod\_{j=1}^p \frac{\Gamma(\alpha+\nu+\delta-\frac{j-1}{2})}{\Gamma(\alpha+\nu-\frac{j-1}{2})}.\tag{i}$$

Arak M. Mathai, Serge B. Provost, Hans J. Haubold

$$\frac{\Gamma\_p(\alpha(1+h)+\gamma)}{\Gamma\_p(\alpha(1+h)+\gamma+\delta)} = \prod\_{j=1}^p \frac{\Gamma(\alpha(1+h)+\gamma-\frac{j-1}{2})}{\Gamma(\alpha(1+h)+\gamma+\delta-\frac{j-1}{2})}.\tag{ii}$$

Consider the following form of Stirling's asymptotic approximation formula for gamma functions, namely,

$$
\Gamma(z+\eta) \approx \sqrt{2\pi} z^{z+\eta-\frac{1}{2}} \mathbf{e}^{-z} \text{ for } |z| \to \infty \text{ and } \eta \text{ bounded.} \tag{6.5.14}
$$

On applying this asymptotic formula to the gamma functions appearing in *(i)* and *(ii)* for *α* → ∞, we have

$$\prod\_{j=1}^p \frac{\Gamma(\alpha + \nu + \delta - \frac{j-1}{2})}{\Gamma(\alpha + \nu - \frac{j-1}{2})} \to \alpha^{p^{\delta}}$$

and

$$\prod\_{j=1}^{p} \frac{\Gamma(\alpha(1+h) + \gamma - \frac{j-1}{2})}{\Gamma(\alpha(1+h) + \gamma + \delta - \frac{j-1}{2})} \to [\alpha(1+h)]^{-p\,\delta},\tag{iii}$$

so that

$$E[\boldsymbol{\mu}^{h}] \to (1+h)^{-p\,\delta}.\tag{i\nu}$$

On noting that *<sup>E</sup>*[*uh*] = *<sup>E</sup>*[e*<sup>h</sup>* ln *<sup>u</sup>*] → *(*1+*h)*−*p δ*, it is seen that ln *<sup>u</sup>* has the mgf *(*1+*h)*−*p δ* for 1 <sup>+</sup> *h >* 0 or <sup>−</sup>2 ln *<sup>u</sup>* has mgf *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*h)*−*p δ* for 1 <sup>−</sup> <sup>2</sup>*h >* 0, which happens to be the mgf of a real scalar chisquare variable with 2 *p δ* degrees of freedom if 2 *p δ* is a positive integer or a real gamma variable with the parameters *(α* = *p δ, β* = 2*)*. Hence the following result.

**Corollary 6.5.1.** *Consider a slightly more general case than that considered in Theorem 6.5.2. Let the h-th moment of u be of the form*

$$E[\boldsymbol{u}^{h}] = \left\{ \prod\_{j=1}^{p} \frac{\Gamma(\alpha(1+h) + \boldsymbol{\gamma}\_{j})}{\Gamma(\alpha + \boldsymbol{\gamma}\_{j})} \right\} \Big| \prod\_{j=1}^{p} \frac{\Gamma(\alpha + \boldsymbol{\gamma}\_{j} + \boldsymbol{\delta}\_{j})}{\Gamma(\alpha(1+h) + \boldsymbol{\gamma}\_{j} + \boldsymbol{\delta}\_{j})} \Big|.\tag{6.5.15}$$

*Then as <sup>α</sup>* → ∞*, E*[*uh*] → *(*1+*h)*−*(δ*1+···+*δp) , which implies that* <sup>−</sup>2 ln *<sup>u</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> 2*(δ*1+···+*δp) whenever* 2*(δ*<sup>1</sup> +···+ *δp) is a positive integer or, equivalently,* −2 ln *u tends to a real gamma variable with the parameters (α* = *δ*<sup>1</sup> +···+ *δp, β* = 2*) .*

Let us examine the asymptotic distribution of the test statistic for the sphericity test in the light of Theorem 6.5.2. It is seen from (6.5.4) that *<sup>λ</sup><sup>h</sup>* <sup>=</sup> *<sup>u</sup>h<sup>n</sup>* <sup>2</sup> . Thus, by replacing *<sup>h</sup>* by *<sup>n</sup>* <sup>2</sup>*h* in (6.5.8) with *m* = *n* − 1, we have

$$E[\lambda^h|H\_o] = \left\{ \prod\_{j=1}^{p-1} \frac{\Gamma(\frac{n-1}{2} + \frac{j}{p})}{\Gamma(\frac{n-1}{2} - \frac{j}{2})} \right\} \left\{ \prod\_{j=1}^{p-1} \frac{\Gamma(\frac{n}{2}(1+h) - \frac{1}{2} - \frac{j}{2})}{\Gamma(\frac{n}{2}(1+h) - \frac{1}{2} + \frac{j}{p})} \right\}.\tag{6.5.16}$$

Then, it follows from Corollary 6.5.1 that <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> 2 *p*−<sup>1</sup> *<sup>j</sup>*=<sup>1</sup> *( <sup>j</sup>* <sup>2</sup><sup>+</sup> *<sup>j</sup> p )* , a chi-square random variable having 2*p*−<sup>1</sup> *<sup>j</sup>*=<sup>1</sup> *( <sup>j</sup>* <sup>2</sup> <sup>+</sup> *<sup>j</sup> <sup>p</sup> )* <sup>=</sup> *p(p*−1*)* <sup>2</sup> <sup>+</sup> *(p* <sup>−</sup> <sup>1</sup>*)* <sup>=</sup> *(p*−1*)(p*+2*)* <sup>2</sup> degrees of freedom. Hence the following result:

**Theorem 6.5.3.** *Consider the λ-criterion for testing the hypothesis of sphericity. Then, under the null hypothesis,* <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *(p*−1*)(p*+2*)* 2 *, as the sample size n* → ∞*. In the complex case, as <sup>n</sup>* → ∞*,* <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *(p*−1*)(p*+1*) , a real scalar chisquare variable with* 2[ *p(p*−1*)* <sup>2</sup> <sup>+</sup> *(p*−1*)* <sup>2</sup> ] = *p(p* − 1*)* + *(p* − 1*)* = *(p* − 1*)(p* + 1*) degrees of freedom.*

**Note 6.5.1.** We observe that the degrees of freedom of the real chisquare variable in the real scalar case is *(p*−1*)(p*+2*)* <sup>2</sup> , which is also equal to the number of parameters restricted by the null hypothesis. Indeed, when *<sup>Σ</sup>* <sup>=</sup> *<sup>σ</sup>*2*<sup>I</sup>* , we have *σij* <sup>=</sup> <sup>0</sup>*, i* = *j* , which produces *p(p*−1*)* <sup>2</sup> restrictions and, since *<sup>σ</sup>*<sup>2</sup> is unknown, requiring that the diagonal elements are such that *<sup>σ</sup>*<sup>11</sup> =···= *σpp* produces *<sup>p</sup>*−1 additional restrictions for a total of *(p*−1*)(p*+2*)* <sup>2</sup> restrictions being imposed. Thus, the degrees of freedom of the asymptotic chisquare variable corresponds to the number of restrictions imposed by *Ho*, which, actually, is a general result.

#### **6.6. Testing the Hypothesis that the Covariance Matrix is Diagonal**

Consider the null hypothesis that *Σ*, the nonsingular covariance matrix of a *p*-variate real normal distribution, is diagonal, that is,

$$H\_o: \Sigma = \text{diag}(\sigma\_{11}, \dots, \sigma\_{pp}).$$

Since the population is assumed to be normally distributed, this implies that the components of the *p*-variate Gaussian vector are mutually independently distributed as univariate normal random variables whose respective variances are *σjj , j* = 1*,...,p*. Consider a simple random sample of size *n* from a nonsingular *Np(μ, Σ)* population or, equivalently, let *X*1*,...,Xn* be independently distributed as *Np(μ, Σ)* vectors, *Σ >* 0. Under *Ho*, *σjj* is estimated by its MLE which is *<sup>σ</sup>*ˆ*jj* <sup>=</sup> <sup>1</sup> *<sup>n</sup>sjj* where *sjj* is the *j* -th diagonal element of *S* = *(sij )*, the sample sum of products matrix. The maximum of the likelihood function under the null hypothesis is then

$$\max\_{H\_o} L = \prod\_{j=1}^{p} \max\_{H\_o} L\_j = \frac{1}{(2\pi)^{\frac{np}{2}} \prod\_{j=1}^{p} [s\_{jj}]^{\frac{n}{2}}} n^{\frac{np}{2}} e^{-\frac{1}{2}(np)},$$

the likelihood function being the joint density evaluated at an observed value of the sample. Observe that the overall maximum or the maximum in the entire parameter space remains the same as that given in (6.1.1). Thus, the *λ*-criterion is given by

#### Arak M. Mathai, Serge B. Provost, Hans J. Haubold

$$\lambda = \frac{\sup\_{\boldsymbol{\alpha}} \boldsymbol{L}}{\sup\_{\boldsymbol{\Omega}} \boldsymbol{L}} = \frac{|\boldsymbol{S}|^{\frac{\boldsymbol{\eta}}{2}}}{\prod\_{j=1}^{p} \boldsymbol{s}\_{jj}^{\frac{\boldsymbol{\eta}}{2}}} \Rightarrow \boldsymbol{\mu}\_2 = \lambda^{\frac{2}{\boldsymbol{\eta}}} = \frac{|\boldsymbol{S}|}{\prod\_{j=1}^{p} \boldsymbol{s}\_{jj}} \tag{6.6.1}$$

where *S* ∼ *Wp(m, Σ), Σ > O,* and *m* = *n* − 1, *n* being the sample size. Under *Ho*, *Σ* = diag*(σ*11*,...,σpp)*. Then for an arbitrary *h*, the *h*-th moment of *u*<sup>2</sup> is available by taking the expected value of *λ* 2 *<sup>n</sup>* with respect to the density of *S*, that is,

$$E[\boldsymbol{u}\_2^h | \boldsymbol{H}\_o] = \int\_{S > O} \frac{|\boldsymbol{S}|^{\frac{m}{2} + h - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\boldsymbol{\Sigma}^{-1} \boldsymbol{S})} (\prod\_{j=1}^p \mathbf{s}\_{jj})^{-h}}{2^{\frac{mp}{2}} |\boldsymbol{\Sigma}|^{\frac{m}{2}} \boldsymbol{\Gamma}\_p(\frac{m}{2})} \text{d}S \tag{6.6.2}$$

where, under *Ho*, <sup>|</sup>*Σ*| = *<sup>σ</sup>*<sup>11</sup> ··· *σpp*. As was done in Sect. 6.1.1, we may replace *<sup>s</sup>*−*<sup>h</sup> jj* by the equivalent integral,

$$\mathbf{s}\_{jj}^{-h} = \frac{1}{\Gamma(h)} \int\_0^\infty \mathbf{x}\_j^{h-1} \mathbf{e}^{-\mathbf{x}\_j(s\_{jj})} \mathbf{d} \mathbf{x}\_j,\ \Re(h) > 0.$$

Thus,

$$\prod\_{j=1}^{p} s\_{jj}^{-h} = \frac{1}{[\Gamma(h)]^p} \int\_0^\infty \cdots \int\_0^\infty x\_1^{h-1} \cdots x\_p^{h-1} \mathbf{e}^{-\text{tr}(YS)} \text{dx}\_1 \wedge \ldots \wedge \text{dx}\_p \tag{i}$$

where *Y* = diag*(x*1*,...,xp),* so that tr*(Y S)* = *x*1*s*<sup>11</sup> +···+ *xpspp*. Then, (6.6.2) can be reexpressed as follow:

$$\begin{split} E[u\_2^h|H\_o] &= \frac{\int\_0^\infty \cdots \int\_0^\infty x\_1^{h-1} \cdots \cdot x\_p^{h-1}}{2^{\frac{mp}{2}} \Gamma\_p(\frac{m}{2}) (\prod\_{j=1}^p \sigma\_{jj})^{\frac{mp}{2}}} \int\_{S>O} |S|^{\frac{m}{2} + h - \frac{p+1}{2}} \text{e}^{-\frac{1}{2} \text{tr}((\Sigma^{-1} + 2Y)\mathcal{S})} \text{d}S \\ &= \frac{\Gamma\_p(\frac{m}{2} + h)}{\Gamma\_p(\frac{m}{2})} \frac{\int\_0^\infty \cdots \int\_0^\infty x\_1^{h-1} \cdots \cdot x\_p^{h-1}}{2^{\frac{mp}{2}} (\prod\_{j=1}^p \sigma\_{jj})^{\frac{mp}{2}}} \\ &\qquad \times \left| \frac{(\Sigma^{-1} + 2Y)}{2} \right|^{-(\frac{m}{2} + h)} \text{d}\mathbf{x}\_1, \wedge \ldots \wedge \mathbf{d}\mathbf{x}\_p, \end{split}$$

and observing that, under *Ho*,

$$\begin{aligned} \left| \frac{(\Sigma^{-1} + 2Y)}{2} \right| &= \left| \frac{\Sigma^{-1}}{2} \right| |I + 2\Sigma Y| \text{ with} \\ |I + 2\Sigma Y| &= (1 + 2\sigma\_{11}\mathbf{y}\_1) \cdots (1 + 2\sigma\_{pp}\mathbf{y}\_p), \end{aligned}$$

Hypothesis Testing and Null Distributions

$$E[\boldsymbol{u}\_2^h | \boldsymbol{H}\_0] = \frac{\Gamma\_p(\frac{m}{2} + h)}{\Gamma\_p(\frac{m}{2})} \prod\_{j=1}^p \frac{1}{\Gamma(h)} \int\_0^\infty \mathbf{y}\_j^{h-1} (1 + \mathbf{y}\_j)^{-(\frac{m}{2} + h)} \mathbf{d} \mathbf{y}\_j, \ y\_j = 2 \mathbf{x}\_j \sigma\_{jj}$$

$$= \frac{\Gamma\_p(\frac{m}{2} + h)}{\Gamma\_p(\frac{m}{2})} \left[ \frac{\Gamma(\frac{m}{2})}{\Gamma(\frac{m}{2} + h)} \right]^p, \ \Re(\frac{m}{2} + h) > \frac{p-1}{2}. \tag{6.6.3}$$

Thus,

$$\begin{split} E[\boldsymbol{u}\_{2}^{h}|\boldsymbol{H}\_{o}] &= \left[\frac{\Gamma(\frac{m}{2})}{\Gamma(\frac{m}{2}+h)}\right]^{p} \prod\_{j=1}^{p} \frac{\Gamma(\frac{m}{2}-\frac{j-1}{2}+h)}{\Gamma(\frac{m}{2}-\frac{j-1}{2})} \\ &= \frac{[\Gamma(\frac{m}{2})]^{p-1}}{\{\prod\_{j=1}^{p-1}\Gamma(\frac{m}{2}-\frac{j}{2})\}} \frac{\{\prod\_{j=1}^{p-1}\Gamma(\frac{m}{2}-\frac{j}{2}+h)\}}{[\Gamma(\frac{m}{2}+h)]^{p-1}}. \end{split}$$

Denoting the density of *u*<sup>2</sup> as *fu*<sup>2</sup> *(u*2|*Ho)*, we can express it as an inverse Mellin transform by taking *h* = *s* − 1. Then,

$$f\_{\boldsymbol{\mu}\_{2}}(\boldsymbol{\mu}\_{2}|\boldsymbol{H}\_{o}) = c\_{2,p-1} G\_{p-1,p-1}^{p-1,0} \left[ \boldsymbol{\mu}\_{2} \Big|\_{\frac{\boldsymbol{m}}{2} - \frac{j}{2} - 1, \ j = 1, \ldots, p-1}^{\frac{\boldsymbol{m}}{2} - 1, \ldots, \frac{\boldsymbol{m}}{2} - 1} \right], \; 0 \le \boldsymbol{\mu}\_{2} \le 1,\tag{6.6.4}$$

and zero elsewhere, where

$$c\_{2,p-1} = \frac{[\varGamma(\frac{m}{2})]^{p-1}}{\{\prod\_{j=1}^{p-1} \varGamma(\frac{m}{2} - \frac{j}{2})\}}.$$

Some special cases of this density are expounded below.

# **Real and complex cases:** *p* = 2

When *<sup>p</sup>* <sup>=</sup> 2, *<sup>u</sup>*<sup>2</sup> has a real type-1 beta density with the parameters *(α* <sup>=</sup> *<sup>m</sup>* <sup>2</sup> <sup>−</sup><sup>1</sup> <sup>2</sup> *, β* <sup>=</sup> <sup>1</sup> 2 *)* in the real case. In the complex case, it has a real type-1 beta density with the parameters *(α* = *m* − 1*, β* = 1*)*.

# **Real and complex cases:** *p* = 3

In this case, *fu*<sup>2</sup> *(u*2|*Ho)* is given by

$$f\_{\mu\_2}(\mu\_2|H\_o) = c\_{2,2} \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} \frac{\Gamma(\frac{m}{2} - \frac{3}{2} + s)\Gamma(\frac{m}{2} - 2 + s)}{[\Gamma(\frac{m}{2} - 1 + s)]^2} \mu\_2^{-s} \text{ds.}$$

The poles of the integrand are simple. Those coming from *Γ (<sup>m</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup> <sup>2</sup> + *s)* occur at *s* = −*m* <sup>2</sup> <sup>+</sup> <sup>3</sup> <sup>2</sup> − *ν, ν* = 0*,* 1*,...* . The residue *Rν* is the following:

$$R\_{\boldsymbol{\nu}} = u\_2^{\frac{m}{2} - \frac{3}{2} + \nu} \frac{(-1)^{\boldsymbol{\nu}}}{\boldsymbol{\nu}!} \frac{\Gamma(-\frac{1}{2} - \boldsymbol{\nu})}{[\Gamma(\frac{1}{2} - \boldsymbol{\nu})]^2} = \frac{\Gamma(-\frac{1}{2})}{[\Gamma(\frac{1}{2})]^2} \frac{(\frac{1}{2})\_{\boldsymbol{\nu}} (\frac{1}{2})\_{\boldsymbol{\nu}} (-1)^{\boldsymbol{\nu}}}{(\frac{3}{2})\_{\boldsymbol{\nu}}} \frac{u\_2^{\frac{m}{2} - \frac{3}{2} + \boldsymbol{\nu}}}{\boldsymbol{\nu}!}.$$

Summing the residues, we have

$$\sum\_{\nu=0}^{\infty} R\_{\nu} = \frac{\Gamma(-\frac{1}{2})}{[\Gamma(\frac{1}{2})]^2} \mu\_2^{\frac{m}{2}-\frac{3}{2}} \,\_2F\_1(\frac{1}{2}, \frac{1}{2}; \frac{3}{2}; u\_2), \ 0 \le u\_2 \le 1.$$

Now, consider the sum of the residues at the poles of *Γ (<sup>m</sup>* − 2 + *s)*. Observing that *Γ (<sup>m</sup>* <sup>−</sup> <sup>2</sup> <sup>+</sup> *s)* cancels out one of the gamma functions in the denominator, namely *Γ (<sup>m</sup>* 2 − <sup>+</sup> *s)* <sup>=</sup> *(<sup>m</sup>* <sup>−</sup> <sup>2</sup> <sup>−</sup> *s)Γ (<sup>m</sup>* − 2 + *s)*, the integrand becomes

$$\frac{\Gamma(\frac{m}{2} - \frac{3}{2} + s)\mu\_2^{-s}}{(\frac{m}{2} - 2 + s)\Gamma(\frac{m}{2} - 1 + s)},$$

the residue at the pole *<sup>s</sup>* = −*<sup>m</sup>* <sup>2</sup> <sup>+</sup> 2 being *Γ (* <sup>1</sup> <sup>2</sup> *) u m* <sup>2</sup> <sup>−</sup><sup>2</sup> 2 *Γ (*1*)* . Then, noting that *Γ (*−<sup>1</sup> <sup>2</sup> *)* = <sup>−</sup>2*Γ (* <sup>1</sup> <sup>2</sup> *)* = −2 <sup>√</sup>*π*, the density is the following:

$$f\_{u\_2}(\mu\_2|H\_o) = c\_{2,2} \left\{ \sqrt{\pi} u\_2^{\frac{m}{2}-2} - \frac{2}{\sqrt{\pi}} u\_2^{\frac{m}{2}-\frac{3}{2}} \,\_2F\_1(\frac{1}{2}, \frac{1}{2}; \frac{3}{2}; \mu\_2) \right\}, \ 0 \le \mu\_2 \le 1,\tag{6.6.5}$$

and zero elsewhere.

In the complex case, the integrand is

$$\frac{\Gamma(m-2+s)\Gamma(m-3+s)}{[\Gamma(m-1+s)]^2}\mu\_2^{-s} = \frac{1}{(m-2+s)^2(m-3+s)}\mu\_2^{-s},$$

and hence there is a pole of order 1 at *s* = −*m* + 3 and a pole of order 2 at *s* = −*m* + 2. The residue at *<sup>s</sup>* = −*<sup>m</sup>* <sup>+</sup> 3 is *<sup>u</sup>m*−<sup>3</sup> 2 *(*1*)*<sup>2</sup> <sup>=</sup> *<sup>u</sup>m*−<sup>3</sup> <sup>2</sup> and the residue at *s* = −*m* + 2 is given by

$$\lim\_{s \to -m+2} \frac{\partial}{\partial s} (m - 2 + s)^2 \Big[ \frac{1}{(m - 2 + s)^2 (m - 3 + s)} \mu\_2^{-s} \Big] = \lim\_{s \to -m+2} \frac{\partial}{\partial s} \Big[ \frac{\mu\_2^{-s}}{(m - 3 + s)} \Big].$$

which gives the residue as *um*−<sup>2</sup> <sup>2</sup> ln *<sup>u</sup>*<sup>2</sup> <sup>−</sup> *<sup>u</sup>m*−<sup>2</sup> <sup>2</sup> . Thus, the sum of the residues is *<sup>u</sup>m*−<sup>3</sup> 2 + *um*−<sup>2</sup> <sup>2</sup> ln *<sup>u</sup>*<sup>2</sup> <sup>−</sup> *<sup>u</sup>m*−<sup>2</sup> <sup>2</sup> and the constant part is

$$\frac{[\varGamma(m)]^2}{\varGamma(m-1)\varGamma(m-2)} = (m-1)^2(m-2), \ m > 2,$$

so that the density is

$$f\_{\mu\_2}(\mu\_2) = (m-1)^2 (m-2) [\mu\_2^{m-3} + \mu\_2^{m-2} \ln \mu\_2 - \mu\_2^{m-2}], \ 0 < \mu\_2 \le 1, \ m \ge 3, \mu$$

and zero elsewhere. Note that as *<sup>u</sup>*<sup>2</sup> <sup>→</sup> 0, the limit of *<sup>u</sup>m*−<sup>1</sup> 2 *<sup>m</sup>*−<sup>1</sup> ln *<sup>u</sup>*<sup>2</sup> is zero. By integrating out over 0 *< u*<sup>2</sup> ≤ 1 while *m* ≥ 3, it can be verified that *fu*<sup>2</sup> *(*·*)* is indeed a density function.

# **Real and complex cases:** *p* ≥ 4

As poles of higher orders are present when *p* ≥ 4, both in the real and complex cases, the exact density function of the test statistic will not be herein explicitly given for those cases. Actually, the resulting densities would involve G-functions for which general expansions are for instance provided in Mathai (1993). The exact null and non-null densities of *u* = *λ* 2 *<sup>n</sup>* have been previously derived by the first author. Percentage points accurate to the 11th decimal place are available from Mathai and Katiyar (1979a, 1980) for the null case; as well, various aspects of the distribution of the test statistic are discussed in Mathai and Rathie (1971) and Mathai (1973, 1984, 1985)

Let us now consider the asymptotic distribution of the *λ*-criterion under the null hypothesis,

$$H\_o: \Sigma = \text{diag}(\sigma\_{\text{ll}}, \dots, \sigma\_{pp}).$$

Given the representation of the *h*-th moment of *u*<sup>2</sup> provided in (6.6.3) and referring to Corollary 6.5.1, it is seen that the sum of the *δj* 's is *p*−<sup>1</sup> *<sup>j</sup>*=<sup>1</sup> *δj* <sup>=</sup> *p*−<sup>1</sup> *j*=1 *j* <sup>2</sup> <sup>=</sup> *p(p*−1*)* <sup>4</sup> , so that the number of degrees of freedom of the asymptotic chisquare distribution is 2[ *p(p*−1*)* 4 ] = *p(p*−1*)* <sup>2</sup> which, as it should be, is the number of restrictions imposed by *Ho*, noting that when *Σ* is diagonal, *σij* = 0*, i* <sup>=</sup> *<sup>j</sup>* , which produces *p(p*−1*)* <sup>2</sup> restrictions. Hence, the following result:

**Theorem 6.6.1.** *Let λ be the likelihood ratio criterion for testing the hypothesis that the covariance matrix Σ of a nonsingular Np(μ, Σ) distribution is diagonal. Then, as <sup>n</sup>* → ∞*,* <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *p(p*−1*)* 2 *in the real case. In the corresponding complex case, as <sup>n</sup>* → ∞*,* <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *p(p*−1*) , a real scalar chisquare variable having p(p* − 1*) degrees of freedom.*

#### **6.7. Equality of Diagonal Elements, Given that** *Σ* **is Diagonal, Real Case**

In the case of a *p*-variate real nonsingular *Np(μ, Σ)* population, whenever *Σ* is diagonal, the individual components are independently distributed as univariate normal random variables. Consider a simple random sample of size *n*, that is, a set of *p* × 1 vectors *X*1*,...,Xn,* that are iid as *Xj* ∼ *Np(μ, Σ)* where it is assumed that *<sup>Σ</sup>* <sup>=</sup> diag*(σ*<sup>2</sup> <sup>1</sup> *,...,σ*<sup>2</sup> *p)*. Letting *X*- *<sup>j</sup>* = *(x*1*<sup>j</sup> ,...,xpj )*, the joint density of the *xrj* 's, *j* = 1*, . . . , n,* in the above sample, which is denoted by *Lr*, is given by

$$L\_r = \prod\_{j=1}^n \frac{\mathbf{e}^{-\frac{1}{2\sigma\_r^2}(\mathbf{x}\_{rj} - \mu\_r)^2}}{(2\pi)^{\frac{1}{2}}(\sigma\_r^2)^{\frac{1}{2}}} = \frac{\mathbf{e}^{-\frac{1}{2\sigma\_r^2}\sum\_{j=1}^n (\mathbf{x}\_{rj} - \mu\_r)^2}}{(2\pi)^{\frac{n}{2}}(\sigma\_r^2)^{\frac{n}{2}}}.$$

Then, on substituting the maximum likelihood estimators of *μr* and *σ*<sup>2</sup> *<sup>r</sup>* in *Lr*, its maximum is

$$\max L\_r = \frac{n^{\frac{n}{2}} \mathbf{e}^{-\frac{n}{2}}}{(2\pi)^{\frac{n}{2}} (s\_{rr})^{\frac{n}{2}}}, \ s\_{rr} = \sum\_{j=1}^n (\mathbf{x}\_{rj} - \bar{\mathbf{x}}\_r)^2.$$

Under the null hypothesis *Ho*, *σ*<sup>2</sup> <sup>2</sup> = ··· = *<sup>σ</sup>*<sup>2</sup> *<sup>p</sup>* <sup>≡</sup> *<sup>σ</sup>*<sup>2</sup> and the MLE of *<sup>σ</sup>*<sup>2</sup> is a pooled estimate which is equal to <sup>1</sup> *np (s*<sup>11</sup> +···+*spp)*. Thus, the *λ*-criterion is the following in this case: *n*

$$\lambda = \frac{\sup\_{\omega}}{\sup\_{\Omega}} = \frac{[s\_{11}s\_{22}\cdots s\_{pp}]^{\frac{\alpha}{2}}}{(\frac{s\_{11}+\cdots+s\_{pp}}{p})^{\frac{np}{2}}}.\tag{6.7.1}$$

If we let

$$
\mu\_3 = \lambda^{\frac{2}{n}} = \frac{p^p(\prod\_{j=1}^p s\_{jj})}{(\sum\_{j=1}^p s\_{jj})^p},\tag{6.7.2}
$$

then, for arbitrary *h*, the *h*-th moment of *u*<sup>3</sup> is the following:

$$E[\boldsymbol{u}\_3^h | \boldsymbol{H}\_o] = E\left[\frac{p^{ph}(\prod\_{j=1}^p s\_{jj}^h)}{(s\_{11} + \dots + s\_{pp})^{ph}}\right] = E\left[p^{ph}\left(\prod\_{j=1}^p s\_{jj}^h\right)(s\_{11} + \dots + s\_{pp})^{-ph}\right].\tag{6.7.3}$$

Observe that *sjj σ*2 *iid* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>n</sup>*−<sup>1</sup> <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> *m, m* = *n* − 1*,* for *j* = 1*,... , p,* the density of *sjj* being of the form

$$f\_{s\_{jj}}(s\_{jj}) = \frac{s\_{jj}^{\frac{m}{2}-1} \operatorname{e}^{-\frac{s\_{jj}}{2\sigma^2}}}{(2\sigma^2)^{\frac{m}{2}} \Gamma(\frac{m}{2})}, \ 0 \le s\_{jj} < \infty, \ m = n - 1 = 1, 2, \ldots, \tag{i}$$

under *Ho*. Note that *(s*<sup>11</sup> +···+ *spp)*−*ph* can be replaced by an equivalent integral as

$$\mathbf{r}(s\_{11} + \dots + s\_{pp})^{-ph} = \frac{1}{\Gamma(ph)} \int\_0^\infty \mathbf{x}^{ph-1} \mathbf{e}^{-\mathbf{x}(s\_{11} + \dots + s\_{pp})} \mathbf{dx}, \ \Re(h) > 0. \tag{ii}$$

#### Hypothesis Testing and Null Distributions

Due to independence of the *sjj* 's, the joint density of *s*11*,...,spp,* is the product of the densities appearing in *(i)*, and on integrating out *s*11*,...,spp*, we end up with the following:

$$\frac{1}{(2\sigma^2)^{\frac{np}{2}}} \left\{ \prod\_{j=1}^p \frac{\Gamma\left(\frac{m}{2} + h\right)}{\Gamma\left(\frac{m}{2}\right)} \right\} \left\{ \prod\_{j=1}^n \left[ \frac{(1+2\sigma^2 x)}{2\sigma^2} \right]^{-\left(\frac{m}{2} + h\right)} \right\} = \frac{[\Gamma\left(\frac{m}{2} + h\right)]^p}{[\Gamma\left(\frac{m}{2}\right)]^p} \left[ \frac{(1+2\sigma^2 x)^{-p\left(\frac{m}{2} + h\right)}}{(2\sigma^2)^{-ph}} \right].$$

Now, the integral over *x* can be evaluated as follows:

$$\frac{(2\sigma^2)^{ph}}{\Gamma(ph)} \int\_0^\infty x^{ph-1} (1 + 2\sigma^2 x)^{-p(\frac{m}{2} + h)} \mathrm{d}x = \frac{\Gamma(\frac{mp}{2})}{\Gamma(\frac{mp}{2} + ph)}, \; \Re(h) > 0.$$

Thus,

$$E[u\_3^h|H\_o] = p^{ph} \frac{\Gamma^p(\frac{m}{2} + h)}{\Gamma^p(\frac{m}{2})} \frac{\Gamma(\frac{mp}{2})}{\Gamma(\frac{mp}{2} + ph)}, \; \Re(h) > 0. \tag{6.7.4}$$

The density of *u*<sup>3</sup> can be written in terms of an H-function. Since *p* is a positive integer, we can expand one gamma ratio using Gauss' multiplication formula:

$$\frac{\Gamma(\frac{mp}{2})}{\Gamma(\frac{mp}{2} + ph)} = \frac{(2\pi)^{\frac{1-p}{2}} p^{\frac{pm}{2} - \frac{1}{2}} \Gamma(\frac{m}{2}) \Gamma(\frac{m}{2} + \frac{1}{p}) \cdots \Gamma(\frac{m}{2} + \frac{p-1}{p})}{(2\pi)^{\frac{1-p}{2}} p^{\frac{mp}{2} - \frac{1}{2} + ph} \Gamma(\frac{m}{2} + h) \cdots \Gamma(\frac{m}{2} + \frac{p-1}{p} + h)}$$

for *p* = 1*,* 2*,..., m* ≥ *p*. Accordingly,

$$\begin{split} \operatorname{E}[\boldsymbol{u}\_{3}^{h}|\boldsymbol{H}\_{o}] &= \frac{[\varGamma(\frac{m}{2}+h)]^{p}}{[\varGamma(\frac{m}{2})]^{p}} \prod\_{j=0}^{p-1} \frac{\varGamma(\frac{m}{2}+\frac{j}{p})}{\varGamma(\frac{m}{2}+\frac{j}{p}+h)} \\ &= \frac{[\varGamma(\frac{m}{2}+h)]^{p-1}}{[\varGamma(\frac{m}{2})]^{p-1}} \prod\_{j=1}^{p-1} \frac{\varGamma(\frac{m}{2}+\frac{j}{p})}{\varGamma(\frac{m}{2}+\frac{j}{p}+h)} = c\_{3,p-1} \frac{[\varGamma(\frac{m}{2}+h)]^{p-1}}{\prod\_{j=1}^{p-1} \varGamma(\frac{m}{2}+\frac{j}{p}+h)}, \end{split} \tag{6.7.5}$$

$$c\_{3,p-1} = \frac{\prod\_{j=1}^{p-1} \Gamma(\frac{m}{2} + \frac{j}{p})}{[\Gamma(\frac{m}{2})]^{p-1}}, \ \mathfrak{N}(\frac{m}{2} + h) > 0. \tag{6.7.6}$$

Hence, for *h* = *s* − 1, (6.7.5) is the Mellin transform of the density of *u*3. Thus, denoting the density by *fu*<sup>3</sup> *(u*3*)*, we have

$$f\_{\mathcal{U}\_3}(\mu\_3|H\_0) = c\_{\mathfrak{I},p-1} G\_{p-1,p-1}^{p-1,0} \left[ \mathfrak{u}\_3 \vert\_{\frac{\mathfrak{m}}{2}-1, \dots, \frac{\mathfrak{m}}{2}-1}^{\frac{\mathfrak{m}}{2}, \mathfrak{j}-1, \dots, p-1} \right], \ 0 \le \mathfrak{u}\_3 \le 1,\tag{6.7.7}$$

and zero elsewhere.

In the complex case, the *h*-th moment is the following:

$$E[\tilde{\mu}\_3^h|H\_o] = \tilde{c}\_{3, p-1} \frac{[\tilde{\Gamma}(m+h)]^{p-1}}{\prod\_{j=1}^{p-1} \tilde{\Gamma}(m + \frac{j}{p} + h)},\tag{6.7a.1}$$

$$\tilde{c}\_{3,p-1} = \frac{\prod\_{j=1}^{p-1} \tilde{\Gamma}(m + \frac{j}{p})}{[\tilde{\Gamma}(m)]^{p-1}}.\tag{6.7a.2}$$

and the corresponding density is given by

$$\tilde{f}\_{\tilde{\mu}\_3}(\tilde{\mu}\_3|H\_o) = \tilde{c}\_{3,p-1} G\_{p-1,p-1}^{p-1,0} \left[ \tilde{\mu}\_3 \Big|\_{m-1,\ldots,m-1}^{m-1+\frac{j}{p},\,j=1,\ldots,p-1} \right], \ 0 \le |\tilde{\mu}\_3| \le 1,\qquad(6.7a.3)$$

and zero elsewhere, G denoting a real G-function.

# **Real and complex cases:** *p* = 2

It is seen from (6.7.5) that for *<sup>p</sup>* <sup>=</sup> 2, *<sup>u</sup>*<sup>3</sup> is a real type-1 beta with the parameters *(α* <sup>=</sup> *<sup>m</sup>* <sup>2</sup> *, β* <sup>=</sup> <sup>1</sup> *<sup>p</sup> )* in the real case. Whenever *p* ≥ 3, poles of order 2 or more are occurring, and the resulting density functions which are expressible in terms generalized hypergeometric functions, will not be explicitly provided. For a general series expansion of the G-function, the reader may refer to Mathai (1970a, 1993).

In the complex case, when *p* = 2, *u*˜<sup>3</sup> has a real type-1 beta density with the parameters *(α* <sup>=</sup> *m, β* <sup>=</sup> <sup>1</sup> *<sup>p</sup> )*. In this instance as well, poles of higher orders will be present when *p* ≥ 3*,* and hence explicit forms of the corresponding densities will not be herein provided. The exact null and non-null distributions of the test statistic are derived for the general case in Mathai and Saxena (1973), and highly accurate percentage points are provided in Mathai (1979a,b).

An asymptotic result can also be obtained as *n* → ∞ . Consider the *h*-th moment of *λ*, which is available from (6.7.5) in the real case and from (6.7*a*.1) in the complex case. Then, referring to Corollary 6.5.2, *δj* <sup>=</sup> *<sup>j</sup> <sup>p</sup>* whether in the real or in the complex situations. Hence, 2[ *p*−<sup>1</sup> *<sup>j</sup>*=<sup>1</sup> *δj* ] = <sup>2</sup> *p*−<sup>1</sup> *j*=1 *j <sup>p</sup>* = *(p* − 1*)* in both the real and the complex cases. As well, observe that in the complex case, the diagonal elements are real since *Σ*˜ is Hermitian positive definite. Accordingly, the number of restrictions imposed by *Ho* in either the real or complex cases is *p* − 1. Thus, the following result:

**Theorem 6.7.1.** *Consider the λ-criterion for testing the equality of the diagonal elements, given that the covariance matrix is already diagonal. Then, as n* → ∞*, the null distribution of* <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup>*−<sup>1</sup> *in both the real and the complex cases.*

#### **6.8. Hypothesis that the Covariance Matrix is Block Diagonal, Real Case**

We will discuss a generalization of the problem examined in Sect. 6.6, considering again the case of real Gaussian vectors. Let *X*1*,...,Xn* be iid as *Xj* ∼ *Np(μ, Σ), Σ > O*, and

$$\begin{aligned} X\_j &= \begin{bmatrix} x\_{1j} \\ \vdots \\ x\_{pj} \end{bmatrix} = \begin{bmatrix} X\_{(1j)} \\ \vdots \\ X\_{(kj)} \end{bmatrix}, \ X\_{(1j)} = \begin{bmatrix} x\_{1j} \\ \vdots \\ x\_{p\_1j} \end{bmatrix}, \ X\_{(2j)} = \begin{bmatrix} x\_{p\_1+1,j} \\ \vdots \\ x\_{p\_1+p\_2,j} \end{bmatrix}, \dots, \\\ \Sigma &= \begin{bmatrix} \Sigma\_{11} & O & \cdots & O \\ O & \Sigma\_{22} & \ddots & \vdots \\ \vdots & \vdots & \cdots & \vdots \\ O & O & \cdots & \Sigma\_{kk} \end{bmatrix}, \ \Sigma\_{jj} \text{ being } p\_j \times p\_j, \ j = 1, \dots, k. \end{aligned}$$

In this case, the *p* × 1 real Gaussian vector is subdivided into subvectors of orders *p*1*,...,pk,* so that *p*<sup>1</sup> +···+*pk* = *p,* and, under the null hypothesis *Ho*, *Σ* is assumed to be a block diagonal matrix, which means that the subvectors are mutually independently distributed *pj* -variate real Gaussian vectors with corresponding mean value vector *μ(j )* and covariance matrix *Σjj , j* = 1*,...,k*. Then, the joint density of the sample values under the null hypothesis can be written as *L* = *k <sup>r</sup>*=<sup>1</sup> *Lr* where *Lr* is the joint density of the sample values corresponding to the subvector *X(rj ), j* = 1*, . . . , n, r* = 1*,...,k*. Letting the *p* × *n* general sample matrix be **X** = *(X*1*,...,Xn)*, we note that the sample representing the first *p*<sup>1</sup> rows of **X** corresponds to the sample from the first subvector *X(*1*j ) iid* ∼ *Np(μ(*1*), Σ*11*), Σ*<sup>11</sup> *> O*, *j* = 1*,...,n*. The MLE's of *μ(r)* and *Σrr* are the corresponding sample mean and sample covariance matrix. Thus, the maximum of *Lr* is available as

$$\max L\_r = \frac{\mathbf{e}^{-\frac{np}{2}} n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} |S\_{rr}|^{\frac{n}{2}}} \stackrel{ind}{\Rightarrow} \prod\_{r=1}^k \max L\_r = \frac{\mathbf{e}^{-\frac{np}{2}} n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} \prod\_{r=1}^k |S\_{rr}|^{\frac{n}{2}}}.$$

Hence,

$$\lambda = \frac{\sup\_{\boldsymbol{\alpha}} \boldsymbol{L}}{\sup\_{\Omega} \boldsymbol{L}} = \frac{|\boldsymbol{S}|^{\frac{n}{2}}}{\prod\_{r=1}^{k} |S\_{rr}|^{\frac{n}{2}}},\tag{6.8.1}$$

and

$$
\mu\_4 \equiv \lambda^{\frac{2}{n}} = \frac{|S|}{\prod\_{r=1}^k |S\_{rr}|}. \tag{6.8.2}
$$

Observe that the covariance matrix *Σ* = *(σij )* can be written in terms of the matrix of population correlations. If we let *<sup>D</sup>* <sup>=</sup> diag*(σ*1*,...,σp)* where *<sup>σ</sup>*<sup>2</sup> *<sup>t</sup>* = *σtt* denotes the variance associated the component *xtj* in *X*- *<sup>j</sup>* = *(x*1*<sup>j</sup> ,...,xpj )* where Cov*(X)* = *Σ*, and *R* = *(ρrs)* be the population correlation matrix, where *ρrs* is the population correlation between the components *xrj* and *xsj* , then, *Σ* = *DRD*. Consider a partitioning of *Σ* into *k* × *k* blocks as well as the corresponding partitioning of *D* and *R*:

$$
\boldsymbol{\Sigma} = \begin{bmatrix}
\boldsymbol{\Sigma}\_{11} & \boldsymbol{\Sigma}\_{12} & \cdots & \boldsymbol{\Sigma}\_{1k} \\
\boldsymbol{\Sigma}\_{21} & \boldsymbol{\Sigma}\_{22} & \cdots & \boldsymbol{\Sigma}\_{2k} \\
\vdots & \vdots & \ddots & \vdots \\
\boldsymbol{\Sigma}\_{k1} & \boldsymbol{\Sigma}\_{k2} & \cdots & \boldsymbol{\Sigma}\_{kk}
\end{bmatrix}, \boldsymbol{\ D} = \begin{bmatrix}
\boldsymbol{D}\_{1} & \boldsymbol{O} & \cdots & \boldsymbol{O} \\
\boldsymbol{O} & \boldsymbol{D}\_{2} & \cdots & \boldsymbol{O} \\
\vdots & \vdots & \ddots & \vdots \\
\boldsymbol{O} & \boldsymbol{O} & \cdots & \boldsymbol{D}\_{k}
\end{bmatrix}, \boldsymbol{R} = \begin{bmatrix}
\boldsymbol{R}\_{11} & \boldsymbol{R}\_{12} & \cdots & \boldsymbol{R}\_{1k} \\
\boldsymbol{R}\_{21} & \boldsymbol{R}\_{22} & \cdots & \boldsymbol{R}\_{2k} \\
\vdots & \vdots & \ddots & \vdots \\
\boldsymbol{R}\_{k1} & \boldsymbol{R}\_{k2} & \cdots & \boldsymbol{R}\_{kk}
\end{bmatrix},
$$

where, for example, *Σjj* is *pj* × *pj* , *p*<sup>1</sup> +···+ *pk* = *p,* and the corresponding partitioning of *D* and *R*. Consider a corresponding partitioning of the sample sum of products matrix *<sup>S</sup>* <sup>=</sup> *(Sij )*, *<sup>D</sup>(s)* and *<sup>R</sup>(s)* where *<sup>R</sup>(s)* is the sample correlation matrix and *<sup>D</sup>(s)* <sup>=</sup> diag*(* <sup>√</sup>*s*11*,...,* <sup>√</sup>*spp)*, where *Sjj , D(s) <sup>j</sup> , R(s) jj* are *pj* × *pj* , *p*<sup>1</sup> +···+ *pk* = *p*. Then,

$$\frac{|\Sigma|}{\prod\_{j=1}^{k}|\Sigma\_{jj}|} = \frac{|\mathcal{R}|}{\prod\_{j=1}^{k}|\mathcal{R}\_{jj}|} \tag{6.8.3}$$

and

$$\mu\_4 \equiv \lambda^{\frac{2}{n}} = \frac{|S|}{\prod\_{j=1}^k |S\_{jj}|} = \frac{|\mathcal{R}^{(s)}|}{\prod\_{j=1}^k |\mathcal{R}\_{jj}^{(s)}|}. \tag{6.8.4}$$

An additional interesting property is now pointed out. Consider a linear function of the original *p* ×1 vector *Xj* ∼ *Np(μ, Σ), Σ > O,* in the form *CXj* where *C* is the diagonal matrix, diag*(c*1*,...,cp)*. In this case, the product *CXj* is such that the *r*-th component of *Xj* is weighted or multiplied by *cr*. Let *C* be a block diagonal matrix that is partitioned similarly to *D* so that its *j* -th diagonal block matrix be the *pj* × *pj* diagonal submatrix *Cj* . Then,

$$\mu\_c = \frac{|CSC'|}{\prod\_{j=1}^k |C\_j S\_{jj} C'\_j|} = \frac{|S|}{\prod\_{j=1}^k |S\_{jj}|} = \mu\_4. \tag{6.8.5}$$

In other words, *u*<sup>4</sup> is invariant under linear transformations on *Xj iid* ∼ *Np(μ, Σ), Σ > O, j* = 1*,...,n*. That is, if *Yj* = *CXj* + *d* where *d* is a constant column vector, then the *p* × *n* sample matrix on *Yj* , namely, **Y** = *(Y*1*,...,Yn)* = *(CX*<sup>1</sup> + *d,... ,CXn* + *d)*,

$$\mathbf{Y} - \bar{\mathbf{Y}} = C(\mathbf{X} - \bar{\mathbf{X}}) \Rightarrow S\_{\mathbf{\bar{Y}}} = (\mathbf{Y} - \bar{\mathbf{Y}})(\mathbf{Y} - \bar{\mathbf{Y}})' = C(\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})' {\mathbf{C}}' = C S {\mathbf{C}}'.$$

Letting *Sy* be partitioned as *S* into *k* × *k* blocks and *Sy* = *(Sijy)*, we have

$$\mu\_{\mathbf{y}} \equiv \frac{|S\_{\mathbf{y}}|}{\prod\_{j=1}^{k} |S\_{jj\mathbf{y}}|} = \frac{|CSC'|}{\prod\_{j=1}^{k} |C\_j S\_{jj} C'\_j|} = \frac{|S|}{\prod\_{j=1}^{k} |S\_{jj}|} = \mu\_4. \tag{6.8.6}$$

Arbitrary moments of *u*<sup>4</sup> can be derived by proceeding as in Sect. 6.6. The *h*-th null moment, that is, the *h*-th moment under the null hypothesis *Ho*, is then

$$E[u\_4^h|H\_o] = \frac{1}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})|\Sigma\_o|^{\frac{m}{2}}} \int\_{S>O} |S|^{\frac{m}{2}+h-\frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma\_o^{-1}S)} \left\{ \prod\_{r=1}^k |S\_{rr}|^{-h} \right\} \text{d}S \qquad (i)$$

where *m* = *n* − 1*, n* being the sample size, and

$$
\Sigma\_o = \begin{bmatrix}
\Sigma\_{11} & O & \cdots & O \\
O & \Sigma\_{22} & \cdots & O \\
\vdots & \vdots & \ddots & \vdots \\
O & O & \cdots & \Sigma\_{kk}
\end{bmatrix},
\ \operatorname{tr}(\Sigma\_o^{-1}S) = \operatorname{tr}(\Sigma\_{11}^{-1}S\_{11}) + \cdots + \operatorname{tr}(\Sigma\_{kk}^{-1}S\_{kk}),
\tag{ii}
$$

where *Srr* is the *r*-th diagonal block of *S*, corresponding to *Σrr* of *Σ* whose order *pr* × *pr, r* = 1*,...,k*, *p*<sup>1</sup> +···+ *pk* = *p*. On noting that

$$|S\_{rr}|^{-h} = \frac{1}{\Gamma\_{p\_r}(h)} \int\_{Y\_r > O} |Y\_r|^{h - \frac{pr+1}{2}} \mathbf{e}^{-\text{tr}(Y\_r S\_{rr})} \mathbf{d}Y\_r, \; r = 1, \ldots, k,\tag{iii}$$

where *Yr > O* is a *pr* × *pr* real positive definite matrix, and replacing each |*Srr*| <sup>−</sup>*<sup>h</sup>* by its integral representation as given in *(iii)*, the exponent of e in *(i)* becomes

$$-\frac{1}{2}[\text{tr}(\Sigma\_o^{-1}S) + 2\text{tr}(YS)], \ Y = \begin{bmatrix} Y\_1 & O & \cdots & O \\ O & Y\_2 & \cdots & O \\ \vdots & \vdots & \ddots & \vdots \\ O & O & \cdots & Y\_k \end{bmatrix}, \ \text{tr}(YS) = \text{tr}(Y\_1S\_{11}) + \dots + \text{tr}(Y\_kS\_{kk}).$$

The right-hand side of equation *(i)* then becomes

$$E[\boldsymbol{u}\_4^h | \boldsymbol{H}\_o] = \frac{\Gamma\_p(\frac{m}{2} + h)}{\Gamma\_p(\frac{m}{2}) |\boldsymbol{\Sigma}\_o|^{\frac{m}{2}}} 2^{ph} \left\{ \prod\_{r=1}^k \frac{1}{\Gamma\_{p\_r}(h)} \int\_{\mathcal{Y}\_r > O} |\boldsymbol{Y}\_r|^{h - \frac{p\_r + 1}{2}} \right\}$$

$$\times |\boldsymbol{\Sigma}\_o^{-1} + 2\boldsymbol{Y}|^{-(\frac{m}{2} + h)} \boldsymbol{\mathrm{d}} \boldsymbol{Y}\_1 \wedge \ldots \wedge \boldsymbol{\mathrm{d}} \boldsymbol{Y}\_k, \; \mathfrak{N}(h) > -\frac{m}{2} + \frac{p - 1}{2}. \qquad (\dot{\boldsymbol{n}})$$

It should be pointed out that the non-null moments of *u*<sup>4</sup> can be obtained by substituting a general *Σ* to *Σo* in *(iv)*. Note that if we replace 2*Y* by *Y* , the factor containing 2, namely 2*ph*, will disappear. Further, under *Ho*,

$$|\Sigma\_o^{-1} + 2Y|^{-(\frac{m}{2} + h)} = \left\{ \prod\_{r=1}^k |\Sigma\_{rr}|^{\frac{m}{2} + h} \right\} \left\{ \prod\_{r=1}^k |I + 2\Sigma\_{rr}Y\_r|^{-(\frac{m}{2} + h)} \right\}.\tag{v}$$

Then, each *Yr*-integral can be evaluated as follows:

$$\begin{split} \frac{1}{|\Gamma\_{p\_r}(h)|} \int\_{Y\_r>O} |Y\_r|^{h-\frac{p\_r+1}{2}} |I + 2\Sigma\_{rr}Y\_r|^{-(\frac{m}{2}+h)} \mathrm{d}Y\_r \\ = 2^{-p\_r h} |\Sigma\_{rr}|^{-h} \frac{1}{\Gamma\_{p\_r}(h)} \int\_{Z\_r>O} |Z\_r|^{h-\frac{p\_r+1}{2}} |I + Z\_r|^{-(\frac{m}{2}+h)} \mathrm{d}Z\_r, \ \mathcal{Z}\_r = 2\Sigma\_{rr}^{\frac{1}{2}} Y\_r \Sigma\_{rr}^{\frac{1}{2}} \\ = 2^{-p\_r h} |\Sigma\_{rr}|^{-h} \frac{\Gamma\_{p\_r}(h)}{\Gamma\_{p\_r}(h)} \frac{\Gamma\_{p\_r}(\frac{m}{2})}{\Gamma\_{p\_r}(\frac{m}{2}+h)}. \end{split}$$

On combining equations *(i)* to *(vi)*, we have

$$E[u\_4^h|H\_o] = \frac{\Gamma\_p(\frac{m}{2} + h)}{\Gamma\_p(\frac{m}{2})} \prod\_{r=1}^k \frac{\Gamma\_{p\_r}(\frac{m}{2})}{\Gamma\_{p\_r}(\frac{m}{2} + h)}, \ \Re(\frac{m}{2} + h) > \frac{p-1}{2},\tag{6.8.7}$$

$$\mathbf{c} = c\_{4,p} \frac{\Gamma\_p(\frac{m}{2} + h)}{\prod\_{r=1}^k \Gamma\_{p\_r}(\frac{m}{2} + h)} = c\_{4,p} c^\* \frac{\prod\_{j=1}^p \Gamma(\frac{m}{2} - \frac{j-1}{2} + h)}{\prod\_{r=1}^k \left[\prod\_{i=1}^{p\_r} \Gamma\_{p\_r}(\frac{m}{2} - \frac{i-1}{2} + h)\right]}, \quad (6.8.8)$$

$$\begin{split} c\_{4,p} &= \frac{\prod\_{r=1}^{k} \Gamma\_{p\_r}(\frac{m}{2})}{\Gamma\_p(\frac{m}{2})} = \frac{[\prod\_{r=1}^{k} \pi^{\frac{pr(pr-1)}{4}}]}{\pi^{\frac{p(p-1)}{4}}} \frac{\prod\_{r=1}^{k} \prod\_{i=1}^{p\_r} \Gamma(\frac{m}{2} - \frac{i-1}{2})}{\prod\_{j=1}^{p} \Gamma(\frac{m}{2} - \frac{j-1}{2})}, \\ c^\* &= \frac{\pi^{\frac{p(p-1)}{4}}}{\prod\_{r=1}^{k} \pi^{\frac{p\_r(p\_r-1)}{4}}} \end{split} \tag{6.8.9}$$

so that when *<sup>h</sup>* <sup>=</sup> 0, *<sup>E</sup>*[*u<sup>h</sup>* <sup>4</sup>|*Ho*] = 1. Observe that one set of gamma products can be canceled in (6.8.8) and (6.8.9). When that set is the product of the first *p*<sup>1</sup> gamma functions, the *h*-th moment of *u*<sup>4</sup> is given by

$$E[u\_4^h|H\_o] = c\_{4, p-p\_1} \frac{\prod\_{j=p\_1+1}^p \Gamma(\frac{m}{2} - \frac{j-1}{2} + h)}{\prod\_{r=2}^k \prod\_{i=1}^{p\_r} \Gamma(\frac{m}{2} - \frac{i-1}{2} + h)},\tag{6.8.10}$$

where *<sup>c</sup>*4*,p*−*p*<sup>1</sup> is such that *<sup>E</sup>*[*u<sup>h</sup>* <sup>4</sup>|*Ho*] = 1 when *h* = 0. Since the structure of the expression given in (6.8.10) is that of the *h*-th moment of a product of *p*−*p*<sup>1</sup> independently distributed real scalar type-1 beta random variables, it can be inferred that the distribution of *u*4|*Ho* is also that of a product of *p* − *p*<sup>1</sup> independently distributed real scalar type-1 beta random variables whose parameters can be determined from the arguments of the gamma functions appearing in (6.8.10).

Some of the gamma functions appearing in (6.8.10) will cancel out for certain values of *p*1*,...,pk,*, thereby simplifying the representation of the moments and enabling one to express the density of *u*<sup>4</sup> in terms of elementary functions in such instances. The exact null density in the general case was derived by the first author. For interesting representations of the exact density, the reader is referred to Mathai and Rathie (1971) and Mathai and Saxena (1973), some exact percentage points of the null distribution being included in Mathai and Katiyar (1979a). As it turns out, explicit forms are available in terms of elementary functions for the following special cases, see also Anderson (2003): *p*<sup>1</sup> = *p*<sup>2</sup> = *p*<sup>3</sup> = 1; *p*<sup>1</sup> = *p*<sup>2</sup> = *p*<sup>3</sup> = 2; *p*<sup>1</sup> = *p*<sup>2</sup> = 1*, p*<sup>3</sup> = *p* − 2; *p*<sup>1</sup> = 1*, p*<sup>2</sup> = *p*<sup>3</sup> = 2; *p*<sup>1</sup> = 1*, p*<sup>2</sup> = 2*, p*<sup>3</sup> = 3; *p*<sup>1</sup> = 2*, p*<sup>2</sup> = 2*, p*<sup>3</sup> = 4; *p*<sup>1</sup> = *p*<sup>2</sup> = 2*, p*<sup>3</sup> = 3; *p*<sup>1</sup> = 2*, p*<sup>2</sup> = 3*, p* is even.

# **6.8.1. Special case:** *k* = 2

Let us consider a certain 2 × 2 partitioning of *S*, which corresponds to the special case *k* = 2. When *p*<sup>1</sup> = 1 and *p*<sup>2</sup> = *p* − 1 so that *p*<sup>1</sup> + *p*<sup>2</sup> = *p*, the test statistic is

$$\begin{split} \mu\_{4} &= \frac{|S|}{|S\_{11}| \, |S\_{22}|} = \frac{|S\_{11} - S\_{12}S\_{22}^{-1}S\_{21}|}{|S\_{11}|} \\ &= \frac{s\_{11} - S\_{12}S\_{22}^{-1}S\_{21}}{s\_{11}} = 1 - r\_{1.(2\dots p)}^{2} \end{split} \tag{6.8.11}$$

where *r*1*.(*2*...p)* is the multiple correlation between *x*<sup>1</sup> and *(x*2*,...,xp)*. As stated in Theorem 5.6.3, 1−*r*<sup>2</sup> <sup>1</sup>*.(*2*...p)* is distributed as a real scalar type-1 beta variable with the parameters *(n*−<sup>1</sup> <sup>2</sup> <sup>−</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *)*. The simplifications in (6.8.11) are achieved by making use of the properties of determinants of partitioned matrices, which are discussed in Sect. 1.3. Since *s*<sup>11</sup> is 1 × 1 in this case, the numerator determinant is a real scalar quantity. Thus, this yields a type-2 beta distribution for *<sup>w</sup>* <sup>=</sup> *<sup>u</sup>*<sup>4</sup> <sup>1</sup>−*u*<sup>4</sup> and thereby *<sup>n</sup>*−*<sup>p</sup> <sup>p</sup>*−1*<sup>w</sup>* has an *<sup>F</sup>*-distribution, so that the test can be based on an *F* statistic having *(n* − 1*)* − *(p* − 1*)* = *n* − *p* and *p* − 1 degrees of freedom.

# **6.8.2. General case:** *k* = 2

If in a 2 × 2 partitioning of *S*, *S*<sup>11</sup> is of order *p*<sup>1</sup> × *p*<sup>1</sup> and *S*<sup>22</sup> is of order *p*<sup>2</sup> × *p*<sup>2</sup> with *p*<sup>2</sup> = *p* − *p*1. Then *u*<sup>4</sup> can be expressed as

$$\begin{split} \mu\_{4} &= \frac{|S|}{|S\_{11}| \, |S\_{22}|} = \frac{|S\_{11} - S\_{12}S\_{22}^{-1}S\_{21}|}{|S\_{11}|} \\ &= |I - S\_{11}^{-\frac{1}{2}}S\_{12}S\_{22}^{-1}S\_{21}S\_{11}^{-\frac{1}{2}}| = |I - U|, \; U = S\_{11}^{-\frac{1}{2}}S\_{12}S\_{22}^{-1}S\_{21}S\_{11}^{-\frac{1}{2}} \end{split} \tag{6.8.12}$$

where *U* is called the *multiple correlation matrix*. It will be shown that *U* has a real matrixvariate type-1 beta distribution when *S*<sup>11</sup> is of general order rather than being a scalar.

**Theorem 6.8.1.** *Consider u*<sup>4</sup> *for k* = 2*. Let S*<sup>11</sup> *be p*<sup>1</sup> × *p*<sup>1</sup> *and S*<sup>22</sup> *be p*<sup>2</sup> × *p*<sup>2</sup> *so that p*<sup>1</sup> + *p*<sup>2</sup> = *p. Without any loss of generality, let us assume that p*<sup>1</sup> ≤ *p*2*. Then, under Ho* : *Σ*<sup>12</sup> = *O, the multiple correlation matrix U has a real matrix-variate type-1 beta distribution with the parameters (p*<sup>2</sup> <sup>2</sup> *, <sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*<sup>2</sup> <sup>2</sup> *), with m* = *n* − 1*, n being sample size, and thereby (I* <sup>−</sup> *U )* <sup>∼</sup> *type-1 beta (<sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*<sup>2</sup> <sup>2</sup> *, <sup>p</sup>*<sup>2</sup> <sup>2</sup> *), the determinant of I* − *U being u*<sup>4</sup> *under the null hypothesis when k* = 2*.*

*Proof***:** Since *Σ* under *Ho* can readily be eliminated from a structure such as *u*4, we will take a Wishart matrix *S* having *m* = *n*−1 degrees of freedom, *n* denoting the sample size, and parameter matrix *I* , the identity matrix. At first, assume that *Σ* is a block diagonal matrix and make the transformation *<sup>S</sup>*<sup>1</sup> <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup> *SΣ*−<sup>1</sup> <sup>2</sup> . As a result, *u*<sup>4</sup> will be free of *Σ*<sup>11</sup> and *Σ*22, and so, we may take *S* ∼ *Wp(m, I )*. Now, consider the submatrices *S*11*, S*22*, S*<sup>12</sup> so that d*S* = d*S*<sup>11</sup> ∧ d*S*<sup>22</sup> ∧ d*S*12. Let *f (S)* denote the *Wp(m, I )* density. Then,

$$f(S)\mathrm{d}S = \frac{|S|^{\frac{m}{2} - \frac{p+1}{2}}}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})}\mathrm{e}^{-\frac{1}{2}\mathrm{tr}(S)}\mathrm{d}S,\ \ S = \begin{bmatrix} S\_{11} & S\_{12} \\ S\_{21} & S\_{22} \end{bmatrix},\ \ S\_{21} = S'\_{12}.$$

However, appealing to a result stated in Sect. 1.3, we have

$$\begin{aligned} |S| &= |S\_{22}| \, |S\_{11} - S\_{12} S\_{22}^{-1} S\_{21}| \\ &= |S\_{22}| \, |S\_{11}| \, |I - S\_{11}^{-\frac{1}{2}} S\_{12} S\_{22}^{-1} S\_{21} S\_{11}^{-\frac{1}{2}}|.\end{aligned}$$

The joint density of *S*11*, S*22*, S*<sup>12</sup> denoted by *f*1*(S*11*, S*22*, S*12*)* is then

$$\begin{split} f\_1(S\_{11}, S\_{22}, S\_{12}) &= |S\_{11}|^{\frac{m}{2} - \frac{p+1}{2}} |S\_{22}|^{\frac{m}{2} - \frac{p+1}{2}} |I - U|^{\frac{m}{2} - \frac{p+1}{2}} \\ &\times \frac{\mathbf{e}^{-\frac{1}{2} \text{tr}(S\_{11}) - \frac{1}{2} \text{tr}(S\_{22})}{2^{\frac{mp}{2}} \Gamma\_P(\frac{m}{2})}, \; U = S\_{11}^{-\frac{1}{2}} S\_{12} S\_{22}^{-1} S\_{21} S\_{11}^{-\frac{1}{2}}. \end{split}$$

Letting *Y* = *S* −1 2 <sup>11</sup> *S*12*S* −1 2 <sup>22</sup> *,* it follows from a result on Jacobian of matrix transformation, previously established in Chap. 1, that d*Y* = |*S*11| <sup>−</sup>*p*<sup>2</sup> <sup>2</sup> |*S*22| <sup>−</sup>*p*<sup>1</sup> <sup>2</sup> d*S*12. Thus, the joint density of *S*11*, S*22*, Y* , denoted by *f*2*(S*11*, S*22*,Y)*, is given by

$$\begin{split} f\_2(S\_{11}, S\_{22}, Y) &= |S\_{11}|^{\frac{m}{2} + \frac{p\_2}{2} - \frac{p+1}{2}} |S\_{22}|^{\frac{m}{2} + \frac{p\_1}{2} - \frac{p+1}{2}} |I - YY'|^{\frac{m}{2} - \frac{p+1}{2}} \\ &\times \frac{\mathbf{e}^{-\frac{1}{2} \text{tr}(S\_{11}) - \frac{1}{2} \text{tr}(S\_{22})}}{2^{\frac{pm}{2}} \Gamma\_p(\frac{m}{2})}, \end{split}$$

Note that *S*11*, S*22*, Y* are independently distributed as *f*2*(*·*)* can be factorized into functions of *S*11*, S*22*, Y* . Now, letting *U* = *Y Y* - , it follows from Theorem 4.2.3 that

$$\mathrm{d}Y = \frac{\pi^{\frac{p\_1 p\_2}{2}}}{\Gamma\_{p\_1}(\frac{p\_2}{2})} |U|^{\frac{p\_2}{2} - \frac{p\_1 + 1}{2}} \mathrm{d}U,$$

and the density of *U*, denoted by *f*3*(U )*, can then be expressed as follows:

$$\{f\_3(U) = c \, | \, U \vert^{\frac{p\_2}{2} - \frac{p\_1 + 1}{2}} \vert I - U \vert^{\frac{m}{2} - \frac{p\_2}{2} - \frac{p\_1 + 1}{2}}, \, \, O < U < I,\tag{6.8.13}$$

which is a real matrix-variate type-1 beta density with the parameters *(p*<sup>2</sup> <sup>2</sup> *, <sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*<sup>2</sup> <sup>2</sup> *),* where *c* is the normalizing constant. As a result, *I* − *U* has a real matrix-variate type-1 beta distribution with the parameters *(<sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*<sup>2</sup> <sup>2</sup> *, <sup>p</sup>*<sup>2</sup> <sup>2</sup> *)*. Finally, observe that *u*<sup>4</sup> is the determinant of *I* − *U*.

**Corollary 6.8.1.** *Consider u*<sup>4</sup> *as given in (6.8.12) and the determinant* |*I* − *U*| *where U and <sup>I</sup>* <sup>−</sup> *<sup>U</sup> are defined in Theorem 6.8.1. Then for <sup>k</sup>* <sup>=</sup> <sup>2</sup> *and an arbitrary <sup>h</sup>, <sup>E</sup>*[*u<sup>h</sup>* <sup>4</sup>|*Ho*] = |*I* − *U*| *h.*

*Proof***:** On letting *k* = 2 in (6.8.8), we obtain the *h*-th moment of *u*4|*Ho* as

$$E[u\_4^h|H\_o] = c\_{4,p} \frac{\prod\_{j=1}^p \Gamma(\frac{m}{2} - \frac{j-1}{2} + h)}{\{\prod\_{j=1}^{p\_1} \Gamma(\frac{m}{2} - \frac{j-1}{2} + h)\} \{\prod\_{j=1}^{p\_2} \Gamma(\frac{m}{2} - \frac{j-1}{2} + h)\}}. \tag{i}$$

After canceling *p*<sup>2</sup> of the gamma functions, the remaining gamma product in the numerator of *(i)* is

$$
\Gamma\left(a - \frac{p\_2}{2}\right)\Gamma\left(a - \frac{p\_2 + 1}{2}\right)\cdots\Gamma\left(a - \frac{p - 1}{2}\right) = \Gamma\_{p\_1}\left(a - \frac{p\_2}{2}\right), \; a = \frac{m}{2} + h, \; b
$$

excluding *π p*1*(p*1−1*)* <sup>4</sup> . The remainder of the gamma product present in the denominator is comprised of the gamma functions coming from *Γp*<sup>1</sup> *(<sup>m</sup>* <sup>2</sup> <sup>+</sup> *h)*, excluding *<sup>π</sup> <sup>p</sup>*1*(p*1−1*)* <sup>4</sup> . The normalizing constant will automatically take care of the factors containing *π*. Now, the resulting part containing *h* is *Γp*<sup>1</sup> *(<sup>m</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*<sup>2</sup> <sup>2</sup> <sup>+</sup>*h)/Γp*<sup>1</sup> *(<sup>m</sup>* <sup>2</sup> +*h)*, which is the gamma ratio in the *h*-th moment of a *p*<sup>1</sup> × *p*<sup>1</sup> real matrix-variate type-1 beta distribution with the parameters *(m* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*<sup>2</sup> <sup>2</sup> *, <sup>p</sup>*<sup>2</sup> <sup>2</sup> *)*.

Since this happens to be *<sup>E</sup>*[|*<sup>I</sup>* <sup>−</sup> *<sup>U</sup>*|]*<sup>h</sup>* for *<sup>I</sup>* <sup>−</sup> *<sup>U</sup>* distributed as is specified in Theorem 6.8.1, the Corollary is established.

An asymptotic result can be established from Corollary 6.5.1 and the *λ*-criterion for testing block-diagonality or equivalently the independence of subvectors in a *p*-variate

Gaussian population. The resulting chisquare variable will have 2 *<sup>j</sup> δj* degrees of freedom where *δj* is as defined in Corollary 6.5.1 for the second parameter of the real scalar type-1 beta distribution. Referring to (6.8.10), we have

$$\begin{aligned} \sum\_{j} \delta\_{j} &= \sum\_{j=p\_1+1}^{p} \frac{j-1}{2} - \sum\_{j=2}^{k} \sum\_{i=1}^{p\_j} \frac{i-1}{2} = \sum\_{j=p\_1+1}^{p} \frac{j-1}{2} - \sum\_{j=2}^{k} \frac{p\_j(p\_j-1)}{4} \\ &= \sum\_{j=1}^{p} \frac{j-1}{2} - \sum\_{j=1}^{k} \frac{p\_j(p\_j-1)}{4} = \frac{p(p-1)}{4} - \sum\_{j=1}^{k} \frac{p\_j(p\_j-1)}{4} = \sum\_{j=1}^{k} \frac{p\_j(p-p\_j)}{4}. \end{aligned}$$

Accordingly, the degrees of freedom of the resulting chisquare is 2[ *<sup>k</sup> j*=1 *pj (p*−*pj )* 4 ] = *<sup>k</sup> j*=1 *pj (p*−*pj )* <sup>2</sup> in the real case. It can also be observed that the number of restrictions imposed by the null hypothesis *Ho* is obtained by first letting all the off-diagonal elements of *Σ* = *Σ* equal to zero and subtracting the off-diagonal elements of the *k* diagonal blocks which produces *p(p*−1*)* <sup>2</sup> <sup>−</sup> *<sup>k</sup> j*=1 *pj (pj*−1*)* <sup>2</sup> <sup>=</sup> *<sup>k</sup> j*=1 *pj (p*−*pj )* <sup>2</sup> . In the complex case, the number of degrees of freedom will be twice that obtained for the real case, the chisquare variable remaining a real scalar chisquare random variable. This is now stated as a theorem.

**Theorem 6.8.2.** *Consider the λ-criterion given in (6.8.1) in the real case and let the corresponding <sup>λ</sup> in the complex case be <sup>λ</sup>*˜*. Then* <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>δ</sup> as n* → ∞ *where n is the sample size and <sup>δ</sup>* <sup>=</sup> *<sup>k</sup> j*=1 *pj (p*−*pj )* <sup>2</sup> *, which is also the number of restrictions imposed by Ho. Analogously, in the complex case,* <sup>−</sup>2 ln *<sup>λ</sup>*˜ <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>δ</sup>*˜ *as <sup>n</sup>* → ∞*, where the chisquare variable remains a real scalar chisquare random variable, <sup>δ</sup>*˜ <sup>=</sup> *<sup>k</sup> <sup>j</sup>*=<sup>1</sup> *pj (p* <sup>−</sup> *pj ) and <sup>n</sup> denotes the sample size.*

#### **6.9. Hypothesis that the Mean Value and Covariance Matrix are Given**

Consider a real *p*-variate Gaussian population *Xj* ∼ *Np(μ, Σ), Σ > O,* and a simple random sample, *X*1*,...,Xn*, from this population, the *Xi*'s being iid as *Xj* . Let the sample mean and the sample sum of products matrix be denoted by *X*¯ and *S*, respectively. Consider the hypothesis *Ho* : *μ* = *μo, Σ* = *Σo* where *μo* and *Σo* are specified. Let us examine the likelihood ratio test for testing *Ho* and obtain the resulting *λ*-criterion. Let the parameter space be = {*(μ, Σ)*|*Σ > O,* − ∞ *< μj <* ∞*, j* = 1*,... , p, μ*- = *(μ*1*,...,μp)*}. Let the joint density of *X*1*,...,Xn* be denoted by *L*. Then, as previously obtained, the maximum value of *L* is

$$\max\_{\Omega} L = \frac{\mathbf{e}^{-\frac{np}{2}} n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}} |S|^{\frac{n}{2}}} \tag{6.9.1}$$

and the maximum under *Ho* is

$$\max\_{H\_o} L = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\sum\_{j=1}^{n}(X\_j - \mu\_o)'\Sigma\_o^{-1}(X\_j - \mu\_o))}}{(2\pi)^{\frac{np}{2}}|\Sigma\_o|^{\frac{n}{2}}}.\tag{6.9.2}$$

Thus,

$$\lambda = \frac{\max\_{H\_o} L}{\max\_{\Omega} L} = \frac{\mathbf{e}^{\frac{np}{2}} |S|^{\frac{n}{2}}}{n^{\frac{np}{2}} |\Sigma\_o|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\sum\_{j=1}^{n} (X\_j - \mu\_o)' \Sigma\_o^{-1} (X\_j - \mu\_o))}.\tag{6.9.3}$$

We reject *Ho* for small values of *λ*. Since the exponential part dominates the polynomial part for large values, we reject for large values of the exponent, excluding *(*−1*)*, which means for large values of *<sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(Xj* <sup>−</sup> *μo)*- *<sup>Σ</sup>*−<sup>1</sup> *<sup>o</sup> (Xj* <sup>−</sup> *μo)* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *np* since *(Xj* − *μo)*- *<sup>Σ</sup>*−<sup>1</sup> *<sup>o</sup> (Xj* <sup>−</sup> *μo) iid* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup>* for each *j* . Hence the criterion consists of

$$\text{rejecting } H\_o \text{ if the observed values of } \sum\_{j=1}^{n} (X\_j - \mu\_o)' \Sigma\_o^{-1} (X\_j - \mu\_o) \ge \chi\_{np, \alpha}^2$$

with

$$\Pr\{\chi\_{np}^2 \ge \chi\_{np,\alpha}^2\} = \alpha. \tag{6.9.4}$$

Let us determine the *h*-th moment of *λ* for an arbitrary *h*. Note that

$$\lambda^h = \frac{\mathbf{e}^{\frac{nph}{2}}}{n^{\frac{nph}{2}} |\Sigma\_o|^{\frac{nh}{2}}} |S|^{\frac{nh}{2}} \mathbf{e}^{-\frac{h}{2} \text{tr}(\Sigma\_o^{-1} S) - \frac{hn}{2} (\bar{X} - \mu\_o)' \Sigma\_o^{-1} (\bar{X} - \mu\_o)}. \tag{6.9.5}$$

Since *λ* contains *S* and *X*¯ and these quantities are independently distributed, we can integrate out the part containing *S* over a Wishart density having *m* = *n* − 1 degrees of freedom and the part containing *X*¯ over the density of *X*¯ . Thus, for *m* = *n* − 1,

$$\begin{split} E\left[ \left( |S|^{\frac{nh}{2}} / |\Sigma\_o|^{\frac{nh}{2}} \right) \mathbf{e}^{-\frac{h}{2} \text{tr}(\Sigma\_o^{-1} S)} | H\_o \right] &= \frac{\int\_{S > O} |S|^{\frac{m}{2} + \frac{nh}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{(1+h)}{2} \text{tr}(\Sigma\_o^{-1} S)}}{2^{\frac{mp}{2}} \Gamma\_p(\frac{m}{2}) |\Sigma\_o|^{\frac{n}{2}(1+h) - \frac{1}{2}}} \text{d}S \\ &= 2^{\frac{nph}{2}} \frac{\Gamma\_p(\frac{n}{2}(1+h) - \frac{1}{2})}{\Gamma\_p(\frac{n}{2} - \frac{1}{2})} (1+h)^{-\left[\frac{n}{2}(1+h) - \frac{1}{2}\right]p} . \end{split}$$

Under *Ho*, the integral over *X*¯ gives

$$\int\_{\bar{X}\ \ (2\pi)^{\frac{p}{2}}|\Sigma\_o|^{\frac{1}{2}}} ^{\sqrt{n}} \mathbf{e}^{-(1+h)\frac{n}{2}(\bar{X}-\mu\_o)'\Sigma\_o^{-1}(\bar{X}-\mu\_o)} \mathrm{d}\bar{X} = (1+h)^{-\frac{p}{2}}.\tag{ii}$$

From *(i)* and *(ii)*, we have

$$E[\lambda^h|H\_o] = \frac{\mathbf{e}^{\frac{nph}{2}} 2^{\frac{nph}{2}}}{n^{\frac{nph}{2}}} \frac{\Gamma\_p(\frac{n}{2}(1+h) - \frac{1}{2})}{\Gamma\_p(\frac{n-1}{2})} (1+h)^{-\left[\frac{n}{2}(1+h)\right]p}.\tag{6.9.6}$$

The inversion of this expression is quite involved due to branch points. Let us examine the asymptotic case as *n* → ∞. On expanding the gamma functions by making use of the version of Stirling's asymptotic approximation formula for gamma functions given in (6.5.14), namely *Γ (z* <sup>+</sup> *η)* <sup>≈</sup> <sup>√</sup>2*πzz*+*η*−<sup>1</sup> <sup>2</sup> <sup>e</sup>−*<sup>z</sup>* for <sup>|</sup>*z*|→∞ and *<sup>η</sup>* bounded, we have

$$\begin{split} \frac{\Gamma\_{p}(\frac{n}{2}(1+h)-\frac{1}{2})}{\Gamma\_{p}(\frac{n-1}{2})} &= \prod\_{j=1}^{p} \frac{\Gamma(\frac{n}{2}(1+h)-\frac{1}{2}-\frac{j-1}{2})}{\Gamma(\frac{n-1}{2}-\frac{j-1}{2})} \\ &= \prod\_{j=1}^{p} \frac{\sqrt{2\pi}\{\frac{n}{2}(1+h)\}^{\frac{n}{2}(1+h)-\frac{1}{2}-\frac{j}{2}-\frac{j-1}{2}}e^{-\frac{n}{2}(1+h)}}{\sqrt{2\pi}\{\frac{n}{2}\right\}^{\frac{n}{2}-\frac{1}{2}-\frac{j}{2}-\frac{j-1}{2}}e^{-\frac{n}{2}}} \\ &= \left[\frac{n}{2}\right]^{\frac{nph}{2}}e^{-\frac{nph}{2}}(1+h)^{\frac{n}{2}(1+h)p-\frac{p}{2}-\frac{p(p+1)}{4}}. \end{split} \tag{iiii}$$

Thus, as *n* → ∞, it follows from (6.9.6) and *(iii)* that

$$E[\lambda^h|H\_o] = (1+h)^{-\frac{1}{2}(p+\frac{p(p+1)}{2})},\tag{6.9.7}$$

which implies that, asymptotically, −2 ln *λ* has a real scalar chisquare distribution with *<sup>p</sup>* <sup>+</sup> *p(p*+1*)* <sup>2</sup> degrees of freedom in the real Gaussian case. Hence the following result:

**Theorem 6.9.1.** *Given a Np(μ, Σ), Σ > O, population, consider the hypothesis Ho* : *μ* = *μo, Σ* = *Σo where μo and Σo are specified. Let λ denote the λ-criterion for testing this hypothesis. Then, in the real case,* <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>δ</sup> as n* → ∞ *where δ* = *<sup>p</sup>* <sup>+</sup> *p(p*+1*)* <sup>2</sup> *and, in the corresponding complex case,* <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *δ*1 *as n* → ∞ *where δ*<sup>1</sup> = 2*p* + *p(p* + 1*), the chisquare variable remaining a real scalar chisquare random variable.*

**Note 6.9.1.** In the real case, observe that the hypothesis *Ho* : *μ* = *μo, Σ* = *Σo* imposes *p* restrictions on the *μ* parameters and *p(p*+1*)* <sup>2</sup> restrictions on the *Σ* parameters, for a total of *<sup>p</sup>* <sup>+</sup> *p(p*+1*)* <sup>2</sup> restrictions, which corresponds to the degrees of freedom for the asymptotic chisquare distribution in the real case. In the complex case, there are twice as many restrictions.

**Example 6.9.1.** Consider the real trivariate Gaussian distribution *N*3*(μ, Σ), Σ > O* and the hypothesis *Ho* : *μ* = *μo, Σ* = *Σo* where *μo*, *Σo* and an observed sample of size 5 are as follows:

$$
\mu\_o = \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix}, \ \Sigma\_o = \begin{bmatrix} 3 & 0 & 0 \\ 0 & 4 & -2 \\ 0 & -2 & 3 \end{bmatrix} \Rightarrow |\Sigma\_o| = 24, \ \text{Cof}(\Sigma\_o) = \begin{bmatrix} 8 & 0 & 0 \\ 0 & 9 & 6 \\ 0 & 6 & 12 \end{bmatrix},
$$

$$
\Sigma\_o^{-1} = \frac{\text{Cof}(\Sigma\_o)}{|\Sigma\_o|} = \frac{1}{24} \begin{bmatrix} 8 & 0 & 0 \\ 0 & 9 & 6 \\ 0 & 6 & 12 \end{bmatrix};
$$

$$
X\_1 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \ X\_2 = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \ X\_3 = \begin{bmatrix} -1 \\ 1 \\ 2 \end{bmatrix}, \ X\_4 = \begin{bmatrix} -2 \\ 1 \\ 2 \end{bmatrix}, \ X\_5 = \begin{bmatrix} 2 \\ -1 \\ 0 \end{bmatrix}.
$$

Now,

$$\begin{split} & \quad (X\_1 - \mu\_o)' \Sigma\_o^{-1} (X\_1 - \mu\_o) = \frac{36}{24}, \ (X\_2 - \mu\_o)' \Sigma\_o^{-1} (X\_2 - \mu\_o) = \frac{33}{24}, \\ & \quad (X\_3 - \mu\_o)' \Sigma\_o^{-1} (X\_3 - \mu\_o) = \frac{104}{24}, \ (X\_4 - \mu\_o)' \Sigma\_o^{-1} (X\_4 - \mu\_o) = \frac{144}{24}, \\ & \quad (X\_5 - \mu\_o)' \Sigma\_o^{-1} (X\_5 - \mu\_o) = \frac{20}{24}, \end{split}$$

and

$$\sum\_{j=1}^{5} (X\_j - \mu\_o)' \Sigma\_o^{-1} (X\_j - \mu\_o) = \frac{1}{24} [36 + 33 + 104 + 144 + 20] = \frac{337}{24} = 14.04.$$

Note that, in this example, *n* = 5*, p* = 3 and *np* = 15. Letting the significance level of the test be *<sup>α</sup>* <sup>=</sup> <sup>0</sup>*.*05, *Ho* is not rejected since 14*.*<sup>04</sup> *< χ*<sup>2</sup> <sup>15</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 25.

#### **6.10. Testing Hypotheses on Linear Regression Models or Linear Hypotheses**

Let the *p* ×1 real vector *Xj* have an expected value *μ* and a covariance matrix *Σ>O* for *j* = 1*, . . . , n,* and the *Xj* 's be independently distributed. Let *Xj , μ, Σ* be partitioned as follows where *x*1*<sup>j</sup> , μ*<sup>1</sup> and *σ*<sup>11</sup> are 1 × 1, *μ(*2*), Σ*<sup>21</sup> are *(p* − 1*)* × 1, *Σ*<sup>12</sup> = *Σ*- <sup>21</sup> and *Σ*<sup>22</sup> is *(p* − 1*)* × *(p* − 1*)*:

$$X\_j = \begin{bmatrix} x\_{1j} \\ X\_{(2)j} \end{bmatrix}, \ \mu = \begin{bmatrix} \mu\_1 \\ \mu\_{(2)} \end{bmatrix}, \ \Sigma = \begin{bmatrix} \sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}. \tag{i}$$

(6.10.3)

If the conditional expectation of *x*1*<sup>j</sup>* , given *X(*2*)j* is linear in *X(*2*)j* , then omitting the subscript *j* since the *Xj* 's are iid, it was established in Eq. (3.3.5) that

$$E[\mathbf{x}\_1|X\_{(2)}] = \mu\_1 + \Sigma\_{12}\Sigma\_{22}^{-1}(X\_{(2)} - \mu\_{(2)}).\tag{6.10.1}$$

When the regression is linear, the best linear predictor of *x*<sup>1</sup> in terms of *X(*2*)* will be of the form

$$E[\mathbf{x}\_1|X\_{(2)}] - E(\mathbf{x}\_1) = \beta'(X\_{(2)} - E(X\_{(2)})), \ \beta' = (\beta\_2, \dots, \beta\_p). \tag{6.10.2}$$

Then, by appealing to properties of the conditional expectation and conditional variance, it was shown in Chap. 3 that *β*- <sup>=</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> . Hypothesizing that *X(*2*)* is not random, or equivalently that the predictor function is a function of the preassigned values of *X(*2*)*, amounts to testing whether *Σ*12*Σ*−<sup>1</sup> <sup>22</sup> = *O*. Noting that *Σ*<sup>22</sup> *> O* since *Σ>O*, the null hypothesis thus reduces to *Ho* : *Σ*<sup>12</sup> = *O*. If the original population *X* is *p*-variate real Gaussian, this hypothesis is then equivalent to testing the independence of *x*<sup>1</sup> and *X(*2*)*. Actually, this has already been discussed in Sect. 6.8.2 for the case of *k* = 2, and is also tantamount to testing whether the population multiple correlation *ρ*1*.(*2*...k)* = 0. Assuming that the population is Gaussian and letting of *u* = *λ* 2 *<sup>n</sup>* where *λ* is the lambda criterion, *<sup>u</sup>* <sup>∼</sup> type-1 beta*(n*−*<sup>p</sup>* <sup>2</sup> *, <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *)* under the null hypothesis; this was established in Theorem 6.8.1 for *<sup>p</sup>*<sup>1</sup> <sup>=</sup> 1 and *<sup>p</sup>*<sup>2</sup> <sup>=</sup> *<sup>p</sup>* <sup>−</sup> 1. Then, *<sup>v</sup>* <sup>=</sup> *<sup>u</sup>* <sup>1</sup>−*<sup>u</sup>* <sup>∼</sup> type-2 beta *(n*−*<sup>p</sup>* <sup>2</sup> *, <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *)*, that is, *<sup>v</sup>* <sup>∼</sup> *<sup>n</sup>*−*<sup>p</sup> <sup>p</sup>*−<sup>1</sup>*Fn*−*p, p*−<sup>1</sup> or *(p*−1*) (n*−*p) u* <sup>1</sup>−*<sup>u</sup>* <sup>∼</sup> *Fn*−*p, p*−1. Hence, in order to test *Ho* : *<sup>β</sup>* <sup>=</sup> *<sup>O</sup>*, reject *Ho* if *Fn*−*p, p*−<sup>1</sup> ≥ *Fn*−*p, p*−<sup>1</sup>*, α,* with *P r*{*Fn*−*p, p*−<sup>1</sup> ≥ *Fn*−*p, p*−1*, α*} = *α*

The test statistic *u* is of the form

$$u = \frac{|S|}{s\_{11}|S\_{22}|}, \ S \sim W\_p(n-1, \Sigma), \ \Sigma > O, \ \frac{(p-1)}{(n-p)} \frac{u}{1-u} \sim F\_{n-p, \ p-1, \nu}$$

where the submatrices of *S*, *s*<sup>11</sup> is 1 × 1 and *S*<sup>22</sup> is *(p* − 1*)* × *(p* − 1*)*. Observe that the number of parameters being restricted by the hypothesis *Σ*<sup>12</sup> = *O* is *p*1*p*<sup>2</sup> = 1*(p* − 1*)* = *p* − 1. Hence as *n* → ∞, the null distribution of −2 ln *λ* is a real scalar chisquare having *p* − 1 degrees of freedom. Thus, the following result:

**Theorem 6.10.1.** *Let the p* × 1 *vector Xj be partitioned into the subvectors x*1*<sup>j</sup> of order* 1 *and X(*2*)j of order p* − 1*. Let the regression of x*1*<sup>j</sup> on X(*2*)j be linear in X(*2*)j , that is, E*[*x*1*<sup>j</sup>* |*X(*2*)j* ] −*E(x*1*<sup>j</sup> )* = *β*- *(X(*2*)j* −*E(X(*2*)j )). Consider the hypothesis Ho* : *β* = *O. Let Xj* ∼ *Np(μ, Σ), Σ > O, for j* = 1*, . . . , n, the Xj 's being independently distributed, and let <sup>λ</sup> be the <sup>λ</sup>-criterion for testing this hypothesis. Then, as <sup>n</sup>* → ∞*,* <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup>*−1*.* **Example 6.10.1.** Let the population be *N*3*(μ, Σ), Σ > O*, and the observed sample of size *n* = 5 be

$$X\_1 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \ X\_2 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \ X\_3 = \begin{bmatrix} -1 \\ 1 \\ 0 \end{bmatrix}, \ X\_4 = \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}, \ X\_5 = \begin{bmatrix} -1 \\ -1 \\ -2 \end{bmatrix}.$$

The resulting sample average *X*¯ and deviation vectors are then

$$\begin{aligned} \bar{X} &= \frac{1}{5} \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix}, \ X\_1 - \bar{X} = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} - \frac{1}{5} \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix} = \frac{1}{5} \begin{bmatrix} 4 \\ 3 \\ 3 \end{bmatrix}, \\\ X\_2 - \bar{X} &= \frac{1}{5} \begin{bmatrix} 4 \\ -2 \\ 3 \end{bmatrix}, \ X\_3 - \bar{X} = \frac{1}{5} \begin{bmatrix} -6 \\ 3 \\ -2 \end{bmatrix}, \\\ X\_4 - \bar{X} &= \frac{1}{5} \begin{bmatrix} 4 \\ 3 \\ 8 \end{bmatrix}, \ X\_5 - \bar{X} = \frac{1}{5} \begin{bmatrix} -6 \\ -7 \\ -12 \end{bmatrix}. \end{aligned}$$

Letting

$$\mathbf{X} = [X\_1, \dots, X\_{\tilde{\mathbf{X}}}], \ \bar{\mathbf{X}} = [\bar{X}, \dots, \bar{X}] \text{ and } \mathbf{S} = (\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})',$$

$$\mathbf{X} - \bar{\mathbf{X}} = \frac{1}{5} \begin{bmatrix} 4 & 4 & -6 & 4 & -6 \\ 3 & -2 & 3 & 3 & -7 \\ 3 & 3 & -2 & 8 & -12 \end{bmatrix}.$$

$$\begin{aligned} S &= \frac{1}{S^2} \begin{bmatrix} 120 & 40 & 140 \\ 40 & 80 & 105 \\ 140 & 105 & 230 \end{bmatrix} = \begin{bmatrix} s\_{11} & s\_{12} \\ S\_{21} & S\_{22} \end{bmatrix}, \\\ S\_{11} &= \frac{120}{25}, \ S\_{12} = \frac{1}{25} [40, 140], \\\ S\_{22} &= \frac{1}{25} \begin{bmatrix} 80 & 105 \\ 105 & 230 \end{bmatrix}, \ S\_{22}^{-1} = \frac{25}{7375} \begin{bmatrix} 230 & -105 \\ -105 & 80 \end{bmatrix}, \end{aligned}$$

*S*12*S*−<sup>1</sup> <sup>22</sup> *S*<sup>21</sup> = 1 *(*25*)*<sup>2</sup> 25 7375 40 140 230 −105 <sup>−</sup>105 80 40 140 <sup>=</sup> <sup>760000</sup> *(*25*)(*7375*) ,* so that the test statistic is

$$\begin{aligned} u &= \frac{s\_{11} - S\_{12}S\_{22}^{-1}S\_{21}}{s\_{11}} = 1 - \frac{S\_{12}S\_{22}^{-1}S\_{21}}{s\_{11}} = 1 - r\_{1,(2,3)}^2\\ &= 1 - \frac{760000}{(120)(7375)} = 1 - 0.859 = 0.141 \Rightarrow\\ v &= \frac{(p-1)}{(n-p)} \frac{u}{1-u} = \left(\frac{2}{2}\right) \left(\frac{0.141}{0.859}\right) = 0.164, \ v &\sim F\_{n-p, p-1}. \end{aligned}$$

Let us test *Ho* at the significance level *α* = 0*.*05. In that case, the critical value which is available from *F* tables is *Fn*−*p, p*−1*, α* = *F*2*,* <sup>2</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 19. Since the observed value of *v* is 0*.*164 *<* 19*, Ho* is not rejected.

**Note 6.10.1.** Observe that

$$u = 1 - r\_{1.(2,3)}^2, \quad r\_{1.(2,3)}^2 = \frac{S\_{12}S\_{22}^{-1}S\_{21}}{s\_{11}}$$

where *r*1*.(*2*,*3*)* is the sample multiple correlation between the first component of *Xj* ∼ *N*3*(μ, Σ), Σ > O,* and the other two components of *Xj* . If the population covariance matrix *Σ* is similarly partitioned, that is,

$$
\Sigma = \begin{bmatrix}
\sigma\_{11} & \Sigma\_{12} \\
\Sigma\_{21} & \Sigma\_{22}
\end{bmatrix}, \quad \text{where } \sigma\_{11} \text{ is } 1 \times 1, \ \Sigma\_{22} \text{ is } (p-1) \times (p-1),
$$

then, the population multiple correlation coefficient is *ρ*1*.(*2*,*3*)* where

$$
\rho\_{1.(2,3)}^2 = \frac{\Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}}{\sigma\_{11}}.
$$

Thus, if *Σ*<sup>12</sup> = *Σ*- <sup>21</sup> = *O*, *ρ*1*.(*2*,*3*)* = 0 and conversely since *σ*<sup>11</sup> *>* 0 and *Σ*<sup>22</sup> *> O* ⇒ *Σ*−<sup>1</sup> <sup>22</sup> *> O*. The regression coefficient *<sup>β</sup>* being equal to the transpose of *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> , *Σ*<sup>12</sup> = *O* also implies that the regression coefficient *β* = *O* and conversely. Accordingly, the hypothesis that the regression coefficient vector *β* = *O* is equivalent to hypothesizing that the population multiple correlation *ρ*1*,(*2*,...,p)* = 0, which also implies the hypothesis that the two subvectors are independently distributed in the multivariate normal case, or that the covariance matrix *Σ*<sup>12</sup> = *O*, the only difference being that the test on regression coefficients is in the conditional space whereas testing the independence of the subvectors or whether the population multiple correlation equals zero is carried out in the entire space. The numerical example included in this section also illustrates the main result presented in Sect. 6.8.1 in connection with testing whether a population multiple correlation coefficient is equal to zero.

#### **6.10.1. A simple linear model**

Consider a linear model of the following form where a real scalar variable *y* is estimated by a linear function of pre-assigned real scalar variables *z*1*,...,zq* :

$$\mathbf{y}\_j = \beta\_o + \beta\_1 z\_{1j} + \dots + \beta\_q z\_{qj} + e\_j, \ j = 1, \dots, n \tag{i}$$

where *y*1*,...,yn* are *n* observations on *y*, *zi*1*, zi*2*,...,zin, i* = 1*,... ,q,* are preassigned values on *z*1*,...,zq* , and *βo, β*1*,...,βq* are unknown parameters. The random components *ej , j* = 1*, . . . , n,* are the corresponding sum total contributions coming from all unknown factors. There are two possibilities with respect to this model: *βo* = 0 or *βo* = 0. If *βo* = 0*, βo* is omitted in model *(i)* and we let *yj* = *xj* . If *βo* = 0, the model is modified by taking *xj* <sup>=</sup> *yj* − ¯*<sup>y</sup>* where *<sup>y</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(y*<sup>1</sup> +···+ *yn)*, then becoming

$$\mathbf{x}\_{j} = \mathbf{y}\_{j} - \bar{\mathbf{y}} = \beta\_{1}(z\_{1j} - \bar{z}\_{1}) + \dots + \beta\_{q}(z\_{qj} - \bar{z}\_{q}) + \epsilon\_{j} \tag{ii}$$

for some error term *j* , where *<sup>z</sup>*¯*<sup>i</sup>* <sup>=</sup> <sup>1</sup> *<sup>n</sup>(zi*<sup>1</sup> + ··· + *zin), i* = 1*,...,q*. Letting *Z*- *j* = *(z*1*<sup>j</sup> ,...,zqj )* if *βo* = 0 and *Z*- *<sup>j</sup>* = *(z*1*<sup>j</sup>* − ¯*z*1*,...,zqj* − ¯*zq )*, otherwise, equation *(ii)* can be written in vector/matrix notation as follows:

$$\begin{aligned} \epsilon\_j &= (\mathbf{x}\_j - \beta\_1 \mathbf{z}\_{1j} - \dots - \beta\_q \mathbf{z}\_{qj}) = (\mathbf{x}\_j - \beta' \mathbf{Z}\_j), \ \boldsymbol{\beta}' &= (\beta\_1, \dots, \beta\_q), \\\sum\_{j=1}^n \epsilon\_j^2 &= \sum\_{j=1}^n (\mathbf{x}\_j - \beta\_1 \mathbf{z}\_{1j} - \dots - \beta\_1 q \mathbf{z}\_{qj})^2 = \sum\_{j=1}^n (\mathbf{x}\_j - \boldsymbol{\beta}' \mathbf{Z}\_j)^2. \end{aligned}$$

Letting

$$X = \begin{bmatrix} x\_1 \\ \vdots \\ x\_n \end{bmatrix}, \ \epsilon = \begin{bmatrix} \epsilon\_1 \\ \vdots \\ \epsilon\_n \end{bmatrix} \text{ and } Z = \begin{bmatrix} z\_{11} & \dots & z\_{1n} \\ z\_{21} & \dots & z\_{2n} \\ \vdots & \ddots & \vdots \\ z\_{q1} & \dots & z\_{qn} \end{bmatrix}, \tag{iii}$$

$$\epsilon' = (X' - \beta'Z) \Rightarrow \sum\_{j=1}^n \epsilon\_j^2 = \epsilon'\epsilon = (X' - \beta'Z)(X - Z'\beta). \tag{iv}$$

The least squares minimum is thus available by differentiating - with respect to *β*, equating the resulting expression to a null vector and solving, which will produce a single critical point that corresponds to the minimum as the maximum occurs at +∞:

$$\begin{aligned} \frac{\partial}{\partial \boldsymbol{\beta}} \left( \sum\_{j=1}^{n} \epsilon\_j^2 \right) = \boldsymbol{O} &\Rightarrow \sum\_{j=1}^{n} \boldsymbol{x}\_j \boldsymbol{Z}\_j' - \hat{\boldsymbol{\beta}}' \sum\_{j=1}^{n} \mathbf{Z}\_j \mathbf{Z}\_j' \\ &\Rightarrow \hat{\boldsymbol{\beta}} = \left( \sum\_{j=1}^{n} \mathbf{Z}\_j \mathbf{Z}\_j' \right)^{-1} \left( \sum\_{j=1}^{n} \mathbf{x}\_j \mathbf{Z}\_j' \right); \end{aligned} \tag{\text{v}}$$

$$\frac{\partial}{\partial \beta}(\epsilon' \epsilon) = O \Rightarrow \hat{\beta} = (ZZ')^{-1}ZX,\tag{vi}$$

that is,

$$\hat{\beta} = \left(\sum\_{j=1}^{n} Z\_j Z\_j'\right)^{-1} \left(\sum\_{j=1}^{n} x\_j Z\_j\right). \tag{6.10.4}$$

Since the *zij* 's are preassigned quantities, it can be assumed without any loss of generality that *(ZZ*- *)* is nonsingular, and thereby that *(ZZ*- *)*−<sup>1</sup> exists, so that the least squares minimum, usually denoted by *s*2, is available by substituting *β*ˆ for *β* in - . Then, at *β* = *β*ˆ,

$$\begin{aligned} \left. \epsilon' \right|\_{\beta=\hat{\beta}} &= X' - X'Z'(ZZ')^{-1}Z = X'[I - Z'(ZZ')^{-1}Z] \text{ so that} \\ \left. s^2 = \epsilon' \epsilon \right|\_{\beta=\hat{\beta}} &= X'[I - Z'(ZZ')^{-1}Z]X \end{aligned} \tag{6.10.5}$$

where *I* − *Z*- *(ZZ*- *)*−1*<sup>Z</sup>* is idempotent and of rank *(n* <sup>−</sup> <sup>1</sup>*)* <sup>−</sup> *<sup>q</sup>*. Observe that if *βo* = 0 in *(i)* and we had proceeded without eliminating *βo*, then *β* would have been of order *(k* + 1*)* × 1 and *I* − *Z*- *(ZZ*- *)*−1*Z*, of rank *<sup>n</sup>* <sup>−</sup> *(q* <sup>+</sup> <sup>1</sup>*)* <sup>=</sup> *<sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>−</sup> *<sup>q</sup>*, whereas if *βo* = 0 and we had eliminated *βo* from the model, then the rank of *I* − *Z*- *(ZZ*- *)*−1*Z* would have been *(n*−1*)*−*q*, that is, unchanged, since *<sup>n</sup> <sup>j</sup>*=<sup>1</sup>*(xj* − ¯*x)*<sup>2</sup> <sup>=</sup> *<sup>X</sup>*- [*<sup>I</sup>* <sup>−</sup> <sup>1</sup> *<sup>n</sup>J J* - ]*X, J* - = *(*1*,...,* 1*)* and the rank of *<sup>I</sup>* <sup>−</sup> <sup>1</sup> *<sup>n</sup>J J* -= *n* − 1.

Some distributional assumptions on *j* are required in order to test hypotheses on *β*. Let *j* <sup>∼</sup> *<sup>N</sup>*1*(*0*, σ*2*), σ*<sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup>*, j* <sup>=</sup> <sup>1</sup>*, . . . , n,* be independently distributed. Then *xj* <sup>∼</sup> *N*1*(β*- *Zj , σ*2*), j* <sup>=</sup> <sup>1</sup>*,...,n* are independently distributed but not identically distributed as the mean value depends on *j* . Under the normality assumption for the *j* 's, it can readily be seen that the least squares estimators of *β* and *σ*<sup>2</sup> coincide with the maximum likelihood estimators. It can also be observed that *σ*<sup>2</sup> is estimated by *<sup>s</sup>*<sup>2</sup> *<sup>n</sup>* where *n* is the sample size. In this simple linear regression context, the parameter space = {*(β, σ*2*)*|*σ*<sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup>}. Thus, under the normality assumption, the maximum of the likelihood function *L* is given by

$$\max\_{\Omega} L = \frac{\mathbf{e}^{-\frac{\boldsymbol{n}}{2}} n^{\frac{\boldsymbol{n}}{2}}}{(2\pi)^{\frac{\boldsymbol{n}}{2}} [\mathbf{s}^{2}]^{\frac{\boldsymbol{n}}{2}}}.\tag{\text{v}\mathbf{i}}$$

Under the hypothesis *Ho* : *β* = *O* or *β*<sup>1</sup> = 0 = ··· = *βq* , the least squares minimum, usually denoted as *s*<sup>2</sup> *<sup>o</sup>* , is *X*- *X* and, assuming normality, the maximum of the likelihood function under *Ho* is the following:

$$\max\_{H\_o} L = \frac{\mathbf{e}^{-\frac{n}{2}} \boldsymbol{n}^{\frac{n}{2}}}{(2\pi)^{\frac{n}{2}} [s\_o^2]^{\frac{n}{2}}}. \tag{\text{vii}}$$

Thus, the *λ*-criterion is

$$\begin{split} \lambda &= \frac{\max\_{H\_o} L}{\max\_{\Omega} L} = \left[ \frac{s^2}{s\_o^2} \right]^{\frac{n}{2}} \\ \Rightarrow \quad \mu &= \lambda^{\frac{2}{n}} = \frac{X'[I - Z'(ZZ')^{-1}Z]X}{X'X} \\ &= \frac{X'[I - Z'(ZZ')^{-1}Z]X}{X'[I - Z'(ZZ')^{-1}Z]X + X'Z'(ZZ')^{-1}ZX} \\ &= \frac{1}{1 + u\_1} \end{split} \tag{6.10.6}$$

where

$$\mu\_{\rm l} = \frac{X^{\prime}Z^{\prime}(ZZ^{\prime})^{-1}ZX}{X^{\prime}[I - Z^{\prime}(ZZ^{\prime})^{-1}Z]X} = \frac{s\_o^2 - s^2}{s^2} \tag{\text{viiii}}$$

with the matrices *Z*- *(ZZ*- *)*−1*<sup>Z</sup>* and *<sup>I</sup>* <sup>−</sup> *<sup>Z</sup>*- *(ZZ*- *)*−1*Z* being idempotent, mutually orthogonal, and of ranks *<sup>q</sup>* and *(n* <sup>−</sup> <sup>1</sup>*)* <sup>−</sup> *<sup>q</sup>*, respectively. We can interpret *<sup>s</sup>*<sup>2</sup> *<sup>o</sup>* <sup>−</sup> *<sup>s</sup>*<sup>2</sup> as the sum of squares due to the hypothesis and *s*<sup>2</sup> as the residual part. Under the normality assumption, *s*2 *<sup>o</sup>*−*s*<sup>2</sup> and *<sup>s</sup>*<sup>2</sup> are independently distributed in light of independence of quadratic forms that was discussed in Sect. 3.4.1; moreover, their representations as quadratic forms in idempotent matrices of ranks *<sup>q</sup>* and *(n* <sup>−</sup> <sup>1</sup>*)* <sup>−</sup> *<sup>q</sup>* implies that *<sup>s</sup>*<sup>2</sup> *o*−*s*<sup>2</sup> *<sup>σ</sup>*<sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>q</sup>* and *<sup>s</sup>*<sup>2</sup> *<sup>σ</sup>*<sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *(n*−1*)*−*<sup>q</sup>* . Accordingly, under the null hypothesis,

$$
\mu\_2 = \frac{(s\_o^2 - s^2)/q}{s^2/[(n-1)-q]} \sim F\_{q, n-1-q},\tag{6.10.7}
$$

that is, an *F*-statistic having *q* and *n* − 1 − *q* degrees of freedom. Thus, we reject *Ho* for small values of *λ* or equivalently for large values of *u*<sup>2</sup> or large values of *Fq,n*−1−*<sup>q</sup>* . Hence, the following criterion:

Reject *Ho* if the observed value of *u*<sup>2</sup> ≥ *Fq,n*−1−*q, α, Pr*{*Fq,n*−1−*<sup>q</sup>* ≥ *Fq,n*−1−*q, α*} = *α.* (6.10.8)

A detailed discussion of the real scalar variable case is provided in Mathai and Haubold (2017).

#### **6.10.2. Hypotheses on individual parameters**

Denoting the expected value of *(*·*)* by *E*[*(*·*)*], it follows from (6.10.4) that

$$\begin{split} E[\hat{\beta}] &= (ZZ')^{-1} Z E(X) = (ZZ')^{-1} ZZ' \beta = \beta, \; E[X] = Z' \beta, \; \text{and} \\ \text{Cov}(\hat{\beta}) &= (ZZ')^{-1} Z [\text{Cov}(X)] Z' (ZZ')^{-1} = (ZZ')^{-1} Z (\sigma^2 I) Z' (ZZ')^{-1} \\ &= \sigma^2 (ZZ')^{-1} . \end{split}$$

Under the normality assumption on *xj* , we have *<sup>β</sup>*<sup>ˆ</sup> <sup>∼</sup> *Nq (β, σ*2*(ZZ*- *)*−1*)*. Letting the *(r, r)*-th diagonal element of *(ZZ*- *)*−<sup>1</sup> be *brr*, then *β*ˆ *<sup>r</sup>*, the estimator of the *r*-th component of the parameter vector *β*, is distributed as *β*ˆ *<sup>r</sup>* <sup>∼</sup> *<sup>N</sup>*1*(βr, σ*<sup>2</sup>*brr)*, so that

$$\frac{\hat{\beta}\_r - \beta\_r}{\hat{\sigma}\sqrt{b\_{rr}}} \sim t\_{n-1-q} \tag{6.10.9}$$

where *tn*−1−*<sup>q</sup>* denotes a Student-*t* distribution having *n* − 1 − *q* degrees of freedom and *<sup>σ</sup>*<sup>ˆ</sup> <sup>2</sup> <sup>=</sup> *<sup>s</sup>*<sup>2</sup> *n*−1−*q* is an unbiased estimator for *σ*2. On writing *s*<sup>2</sup> in terms of , it is easily seen that *<sup>E</sup>*[*s*2] = *(n* <sup>−</sup> <sup>1</sup> <sup>−</sup> *q)σ*<sup>2</sup> where *<sup>s</sup>*<sup>2</sup> is the least squares minimum in the entire parameter space . Thus, one can test hypotheses on *βr* and construct confidence intervals for that parameter by means of the Student-*t* statistic specified in (6.10.9) or its square *t*2 *<sup>n</sup>*−1−*<sup>q</sup>* which has an *<sup>F</sup>* distribution having 1 and *<sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>−</sup> *<sup>q</sup>* degrees of freedom, that is, *t*2 *<sup>n</sup>*−1−*<sup>q</sup>* <sup>∼</sup> *<sup>F</sup>*1*,n*−1−*<sup>q</sup>* .

**Example 6.10.2.** Let us consider a linear model of the following form:

$$\mathbf{y}\_j = \beta\_o + \beta\_1 z\_{1j} + \dots + \beta\_q z\_{qj} + e\_j, \ j = 1, \dots, n, \ j$$

where the *zij* 's are preassigned numbers of the variable *zi*. Let us take *n* = 5 and *q* = 2, so that the sample is of size 5 and, excluding *βo*, the model has two parameters. Let the observations on *y* and the preassigned values on the *zi*'s be the following:

$$\begin{aligned} 1 &= \beta\_o + \beta\_1(0) + \beta\_1(1) + e\_1 \\ 2 &= \beta\_o + \beta\_1(1) + \beta\_2(-1) + e\_2 \\ 4 &= \beta\_o + \beta\_1(-1) + \beta\_2(2) + e\_3 \\ 6 &= \beta\_o + \beta\_1(2) + \beta\_2(-2) + e\_4 \\ 7 &= \beta\_o + \beta\_1(-2) + \beta\_2(5) + e\_5 \end{aligned}$$

The averages on *y, z*<sup>1</sup> and *z*<sup>2</sup> are then

$$\begin{aligned} \bar{\mathbf{y}} &= \frac{1}{5} [1 + 2 + 4 + 6 + 7] = 4, \ \bar{z}\_1 = \frac{1}{5} [0 + 1 + (-1) + 2 + (-2)] = 0 \text{ and} \\ \bar{z}\_2 &= \frac{1}{5} [1 + (-1) + 2 + (-2) + 5] = 1, \end{aligned}$$

and, in terms of deviations, the model becomes

$$\mathbf{x}\_{j} = \mathbf{y}\_{j} - \bar{\mathbf{y}} = \beta\_{1}(z\_{1j} - \bar{z}\_{1}) + \beta\_{2}(z\_{2j} - \bar{z}\_{2}) + \epsilon\_{j}, \epsilon\_{j} = e\_{j} - \bar{e}\_{-}$$

That is,

$$
\begin{bmatrix} -3 \\ -2 \\ 0 \\ 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 1 & -2 \\ -1 & 1 \\ 2 & -3 \\ -2 & 4 \end{bmatrix} \begin{bmatrix} \beta\_1 \\ \beta\_2 \end{bmatrix} + \begin{bmatrix} \epsilon\_1 \\ \epsilon\_2 \\ \epsilon\_3 \\ \epsilon\_4 \\ \epsilon\_5 \end{bmatrix} \Rightarrow X = Z'\beta + \epsilon.
$$

When minimizing - = *(X* − *Z*- *β)*- *(X* − *Z*- *β)*, we determined that *β*ˆ, the least squares estimate of *β*, the least squares minimum *s*<sup>2</sup> and *s*<sup>2</sup> *<sup>o</sup>* <sup>−</sup> *<sup>s</sup>*<sup>2</sup> <sup>=</sup> corresponding to the sum of squares due to *β*, could be express as

$$\begin{aligned} \hat{\beta} &= (ZZ')^{-1}ZX, \ s\_o^2 - s^2 = X'Z'(ZZ')^{-1}ZX, \\ s^2 &= X'[I - Z'(ZZ')^{-1}Z]X \text{ and that} \\ Z'(ZZ')^{-1}Z &= [Z'(ZZ')^{-1}Z]^2, I - Z'(ZZ')^{-1}Z = [I - Z'(ZZ')^{-1}Z]^2, \\ & [I - Z'(ZZ')^{-1}Z][Z'(ZZ')^{-1}Z] = O. \end{aligned}$$

Let us evaluate those quantities:

$$\begin{aligned} Z &= \begin{bmatrix} 0 & 1 & -1 & 2 & -2 \\ 0 & -2 & 1 & -3 & 4 \end{bmatrix}, \; ZZ' = \begin{bmatrix} 10 & -17 \\ -17 & 30 \end{bmatrix}, \\\ |ZZ'| &= 11, \; \text{Cof}(ZZ') = \begin{bmatrix} 30 & 17 \\ 17 & 10 \end{bmatrix}, \; (ZZ')^{-1} = \frac{1}{11} \begin{bmatrix} 30 & 17 \\ 17 & 10 \end{bmatrix}, \\\ (ZZ')^{-1}Z &= \frac{1}{11} \begin{bmatrix} 30 & 17 \\ 17 & 10 \end{bmatrix} \begin{bmatrix} 0 & 1 & -1 & 2 & -2 \\ 0 & -2 & 1 & -3 & 4 \end{bmatrix} \\\ &= \frac{1}{11} \begin{bmatrix} 0 & -4 & -13 & 9 & 8 \\ 0 & -3 & -7 & 4 & 6 \end{bmatrix}; \end{aligned}$$

$$\begin{aligned} \hat{\beta} &= (ZZ')^{-1}ZX = \frac{1}{11} \begin{bmatrix} 0 & -4 & -13 & 9 & 8\\ 0 & -3 & -7 & 4 & 6 \end{bmatrix} \begin{bmatrix} -3\\ -2\\ 0\\ 2\\ 3 \end{bmatrix} \\ &= \frac{1}{11} \begin{bmatrix} 50\\ 32 \end{bmatrix} = \begin{bmatrix} \hat{\beta}\_1\\ \hat{\beta}\_2 \end{bmatrix};$$

$$\begin{aligned} Z'(ZZ')^{-1}Z &= \frac{1}{11} \begin{bmatrix} 0 & 0\\ 1 & -2\\ -1 & 1\\ 2 & -3\\ -2 & 4 \end{bmatrix} \begin{bmatrix} 0 & -4 & -13 & 9 & 8\\ 0 & -3 & -7 & 4 & 6 \end{bmatrix} \\\\ &\qquad \begin{bmatrix} 0 & 0 & 0 & 0 & 0\\ 0 & 2 & 1 & 1 & -4 \end{bmatrix} \end{aligned}$$

$$=\frac{1}{11}\begin{bmatrix}0&0&0&0&0\\0&2&1&1&-4\\0&1&6&-5&-2\\0&1&-5&6&-2\\0&-4&-2&-2&8\end{bmatrix}.$$

Then,

$$\begin{aligned} \label{eq:1} X'Z'(ZZ')^{-1}ZX &= \frac{1}{11} \begin{bmatrix} -3 & -2 & 0 & 2 & 3 \end{bmatrix} \begin{bmatrix} 0 & 0 & 0 & 0 & 0\\ 0 & 2 & 1 & 1 & -4\\ 0 & 1 & 6 & -5 & -2\\ 0 & 1 & -5 & 6 & -2\\ 0 & -4 & -2 & -2 & 8 \end{bmatrix} \begin{bmatrix} -3\\ -2\\ 0\\ 2\\ 3 \end{bmatrix} \\ &= \frac{120}{11}, \quad X'X = 26, \\\ \end{aligned}$$

$$X'[I - Z'(ZZ')^{-1}Z]X = 26 - \frac{120}{11} = \frac{166}{11}, \quad \text{with } n = 5 \text{ and } q = 2.$$

The test statistics *u*<sup>1</sup> and *u*<sup>2</sup> and their observed values are the following:

$$\begin{split} \mu\_{1} &= \frac{X'Z'(ZZ')^{-1}ZX}{X'[I-Z'(ZZ')^{-1}Z]X} = \frac{s\_o^2 - s^2}{s^2} \\ &= \frac{120}{166} = 0.72; \\ \mu\_{2} &= \frac{(s\_o^2 - s^2)/q}{s^2/(n-1-q)} \sim F\_{q, \, n-1-q} \\ &= \frac{120/2}{166/2} = 0.72. \end{split}$$

Letting the significance level be *α* = 0*.*05, the required tabulated critical value is *Fq,n*−1−*q, α* = *F*2*,* <sup>2</sup>*,* <sup>0</sup>*.*<sup>05</sup> = 19. Since 0*.*72 *<* 19, the hypothesis *Ho* : *β* = *O* is not rejected. Thus, we will not proceed to test individual hypotheses on the regression coefficients *β*<sup>1</sup> and *β*2. For tests on general linear models, refer for instance to Mathai (1971).

#### **6.11. Problem Involving Two or More Independent Gaussian Populations**

Consider *k* independent *p*-variate real normal populations *Xj* ∼ *Np(μ(j ), Σ), Σ > O, j* = 1*,... , k,* having the same nonsingular covariance matrix *Σ* but possibly different mean values. We consider the problem of testing hypotheses on linear functions of the mean values. Let *b* = *a*1*μ(*1*)* +···+ *akμ(k)* where *a*1*,...,ak,* are real scalar constants, and the null hypothesis be *Ho* : *b* = *bo* (given), which means that the *aj* 's and *μ(j )*'s, *j* = 1*,... , k,* are all specified. It is also assumed that *Σ* is known. Suppose that simple random samples of sizes *n*1*,...,nk* from these *k* independent normal populations can be secured, and let the sample values be *Xj q , q* = 1*,...,nj ,* where *Xj*1*,...,Xj nj* are iid as *Np(μ(j ), Σ), Σ > O*. Let the sample averages be denoted by *<sup>X</sup>*¯ *<sup>j</sup>* <sup>=</sup> <sup>1</sup> *nj nj <sup>q</sup>*=<sup>1</sup> *Xj q , j* <sup>=</sup> 1*,...,k*. Consider the test statistic *Uk* = *a*1*X*¯ <sup>1</sup> +···+ *akX*¯ *<sup>k</sup>*. Since the populations are independent and *Uk* is a linear function of independent vector normal variables, *Uk* is normally distributed with the mean value *b* = *a*1*μ(*1*)* +···+*akμ(k)* and covariance matrix 1 *<sup>n</sup>Σ*, where <sup>1</sup> *<sup>n</sup>* = *( a*2 1 *<sup>n</sup>*<sup>1</sup> +···+ *<sup>a</sup>*<sup>2</sup> *k nk )* and so, <sup>√</sup>*n Σ*−<sup>1</sup> <sup>2</sup> *(Uk* − *b)* ∼ *Np(O, I )*. Then, under the hypothesis *Ho* : *b* = *bo* (given), which is being tested against the alternative *H*<sup>1</sup> : *b* = *bo*, the test criterion is obtained by proceeding as was done in the single population case. Thus, the test statistic is *z* = *n(Uk* − *bo)*- *<sup>Σ</sup>*−<sup>1</sup>*(Uk* <sup>−</sup> *bo)* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup>* and the criterion will be to reject the null hypothesis for large values of the *z*. Accordingly, the criterion is

$$\begin{aligned} \text{Reject } H\_o: b &= b\_o \text{ if the observed value of } n (U\_k - b\_o)' \Sigma^{-1} (U\_k - b\_o) \ge \chi^2\_{p, \, \alpha}, \\ \text{with } \Pr\{\chi^2\_p \ge \chi^2\_{p, \, \alpha}\} &= \alpha. \end{aligned} \tag{6.11.1}$$

In particular, suppose that we wish to test the hypothesis *Ho* : *δ* = *μ(*1*)* − *μ(*2*)* = *δ*0*,* such as *δo* = 0 as is often the case, against the natural alternative. In this case, when *δo* = 0*,* the null hypothesis is that the mean value vectors are equal, that is, *μ(*1*)* = *μ(*2*)*, and the test statistic is *z* = *n(X*¯ <sup>1</sup> − *X*¯ <sup>2</sup>*)*- *<sup>Σ</sup>*−1*(X*¯ <sup>1</sup> <sup>−</sup> *<sup>X</sup>*¯ <sup>2</sup>*)* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup>* with <sup>1</sup> *<sup>n</sup>* <sup>=</sup> <sup>1</sup> *<sup>n</sup>*<sup>1</sup> <sup>+</sup> <sup>1</sup> *n*2 , the test criterion being

$$\begin{aligned} \text{Reject } H\_0: \mu\_{(1)} - \mu\_{(2)} &= 0 \text{ if the observed value of } z \ge \chi^2\_{p, \,\,a}, \\ \text{with } \Pr\{\chi^2\_p \ge \chi^2\_{p, \,\,a}\} &= \alpha. \end{aligned} \tag{6.11.2}$$

For a numerical example, the reader is referred to Example 6.2.3. One can also determine the power of the test or the probability of rejecting *Ho* : *δ* = *μ(*1*)* − *μ(*2*)* = *δ*0*,* under an alternative hypothesis, in which case the distribution of *z* is a noncentral chisquare variable with *p* degrees of freedom and non-centrality parameter *<sup>λ</sup>* <sup>=</sup> <sup>1</sup> 2 *n*1*n*2 *n*1+*n*<sup>2</sup> *(δ* − *δo)*- *<sup>Σ</sup>*−1*(δ* <sup>−</sup> *δo), δ* <sup>=</sup> *μ(*1*)* <sup>−</sup> *μ(*2*)*, where *<sup>n</sup>*<sup>1</sup> and *<sup>n</sup>*<sup>2</sup> are the sample sizes. Under the null hypothesis, the non-centrality parameter *λ* is equal to zero. The power is given by

$$\text{Power} \, := \Pr\{\text{reject } H\_0 | H\_1\} = \Pr\{\chi^2\_p(\lambda) \ge \chi^2\_{p,\,\,a}(\lambda)\}. \tag{6.11.3}$$

When the population covariance matrices are identical and the common covariance matrix is unknown, one can also construct a statistic for testing hypotheses on linear functions of the mean value vectors by making use of steps parallel to those employed in the single population case, with the resulting criterion being based on Hotelling's *T* <sup>2</sup> statistic for testing *Ho* : *μ(*1*)* = *μ(*2*)*.

#### **6.11.1. Equal but unknown covariance matrices**

Let us consider the same procedure as in Sect. 6.11 to test a hypothesis on a linear function *b* = *a*1*μ(*1*)* + ··· + *akμ(k)* where *a*1*,...,ak* are known real scalar constants and *μ(j ), j* = 1*,... , k,* are the population mean values. We wish to test the hypothesis *Ho* : *b* = *bo* (given) in the sense all the mean values *μ(j ), j* = 1*,...,k* and *a*1*,...,ak,* are specified. . Let *Uk* = *a*1*X*¯ <sup>1</sup> +···+ *akX*¯ *<sup>k</sup>* as previously defined. Then, *E*[*Uk*] = *b* and Cov*(Uk)* = *( a*2 1 *<sup>n</sup>*<sup>1</sup> + ··· + *<sup>a</sup>*<sup>2</sup> *k nk )Σ*, where *( a*2 1 *<sup>n</sup>*<sup>1</sup> + ··· + *<sup>a</sup>*<sup>2</sup> *k nk )* <sup>≡</sup> <sup>1</sup> *<sup>n</sup>* for some symbol *n*. The common covariance matrix *Σ* has the MLE <sup>1</sup> *n*1+···+*nk (S*<sup>1</sup> +···+ *Sk)* where *Sj* is the sample sum of products matrix for the *j* -th Gaussian population. It has been established that *S* = *S*<sup>1</sup> +··· + *Sk* has a Wishart distribution with *(n*<sup>1</sup> − 1*)* +··· + *(nk* − 1*)* = *N* − *k, N* = *n*<sup>1</sup> +···+ *nk,* degrees of freedom, that is,

$$S \sim W\_p(N-k, \Sigma), \ \Sigma > O, \ N = n\_1 + \dots + n\_k. \tag{6.11.4}$$

Then, when *Σ* is unknown, it follows from a derivation parallel to that provided in Sect. 6.3 for the single population case that

$$w \equiv n(U\_k - b)'S^{-1}(U\_k - b) \sim \text{ type-2 beta}\left(\frac{p}{2}, \frac{N - k - p}{2}\right),\tag{6.11.5}$$

or, *w* has a real scalar type-2 beta distribution with the parameters *(<sup>p</sup>* <sup>2</sup> *, <sup>N</sup>*−*k*−*<sup>p</sup>* <sup>2</sup> *)*. Letting *<sup>w</sup>* <sup>=</sup> *<sup>p</sup> N*−*k*−*p F*, this *F* is an *F*-statistic with *p* and *N* − *k* − *p* degrees of freedom.

**Theorem 6.11.1.** *Let Uk, n, N , b, S be as defined above. Then w* = *n(Uk* − *b)*- *S*−<sup>1</sup> *(Uk* <sup>−</sup> *b) has a real scalar type-2 beta distribution with the parameters (<sup>p</sup>* <sup>2</sup> *, <sup>N</sup>*−*k*−*<sup>p</sup>* <sup>2</sup> *). Letting <sup>w</sup>* <sup>=</sup> *<sup>p</sup> N*−*k*−*p F, this F is an F-statistic with p and N* − *k* − *p degrees of freedom.*

Hence for testing the hypothesis *Ho* : *b* = *bo* (given), the criterion is the following:

$$\text{Reject } H\_0 \text{ if the observed value of } F = \frac{N - k - p}{p} w \ge F\_{p \, N - k - p, \, a},\tag{6.11.6}$$

$$\text{with } \Pr\{F\_{p,N-k-p} \ge F\_{p,N-k-p,\ \alpha}\} = \alpha. \tag{6.11.7}$$

Note that by exploiting the connection between type-1 and type-2 real scalar beta random variables, one can obtain a number of properties on this *F*-statistic.

This situation has already been covered in Theorem 6.3.4 for the case *k* = 2*.*

#### **6.12. Equality of Covariance Matrices in Independent Gaussian Populations**

Let *Xj* ∼ *Np(μ(j ), Σj ), Σj > O*, *j* = 1*,... , k,* be independently distributed real *p*-variate Gaussian populations. Consider simple random samples of sizes *n*1*,...,nk* from these *k* populations, whose sample values, denoted by *Xj q , q* = 1*,...,nj ,* are iid as *Xj*1*, j* = 1*,...,k*. The sample sums of products matrices denoted by *S*1*,...,Sk,* respectively, are independently distributed as Wishart matrix random variables with *nj* − 1*, j* = 1*,... , k,* degrees of freedoms. The joint density of all the sample values is then given by

$$L = \prod\_{j=1}^{k} L\_j, \ L\_j = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\boldsymbol{\Sigma}\_j^{-1}\mathbf{S}\_j) - \frac{\boldsymbol{n}\_j}{2}(\bar{\mathbf{X}}\_j - \boldsymbol{\mu}\_{(j)})'\boldsymbol{\Sigma}\_j^{-1}(\bar{\mathbf{X}}\_j - \boldsymbol{\mu}\_{(j)})}{(2\pi)^{\frac{n\_j p}{2}}|\boldsymbol{\Sigma}\_j|^{\frac{n\_j}{2}}},\tag{6.12.1}$$

the MLE's of *μ(j )* and *Σj* being *μ*ˆ *(j )* <sup>=</sup> *<sup>X</sup>*¯ *<sup>j</sup>* and *<sup>Σ</sup>*<sup>ˆ</sup> *<sup>j</sup>* <sup>=</sup> <sup>1</sup> *nj Sj* . The maximum of *L* in the entire parameter space is

$$\max\_{\Omega} L = \prod\_{j=1}^{k} \max\_{\Omega} L\_j = \frac{\left[\prod\_{j=1}^{k} n\_j^{\frac{n\_j p}{2}}\right] \mathbb{e}^{-\frac{Np}{2}}}{(2\pi)^{\frac{Np}{2}} \prod\_{j=1}^{k} |S\_j|^{\frac{n\_j}{2}}}, \ N = n\_1 + \dots + n\_k. \tag{i}$$

Let us test the hypothesis of equality of covariance matrices:

$$H\_0: \Sigma\_1 = \Sigma\_2 = \dots = \Sigma\_k = \Sigma$$

where *Σ* is unknown. Under this null hypothesis, the MLE of *μ(j )* is *X*¯ *<sup>j</sup>* and the MLE of the common *Σ* is <sup>1</sup> *<sup>N</sup> (S*<sup>1</sup> +···+ *Sk)* <sup>=</sup> <sup>1</sup> *<sup>N</sup> S, N* = *n*<sup>1</sup> +···+ *nk, S* = *S*<sup>1</sup> +···+ *Sk*. Thus, the maximum of *L* under *Ho* is

$$\max\_{H\_o} L = \frac{N^{\frac{Np}{2}} \mathbf{e}^{-\frac{Np}{2}}}{(2\pi)^{\frac{Np}{2}} \prod\_{j=1}^{N} |S|^{\frac{n\_j}{2}}} = \frac{N^{\frac{Np}{2}} \mathbf{e}^{-\frac{Np}{2}}}{(2\pi)^{\frac{Np}{2}} |S|^{\frac{N}{2}}},\tag{ii}$$

and the *λ*-criterion is the following:

$$\lambda = \frac{N^{\frac{Np}{2}} \left\{ \prod\_{j=1}^{k} |S\_j|^{\frac{n\_j}{2}} \right\}}{|S|^{\frac{N}{2}} \left\{ \prod\_{j=1}^{k} n\_j^{\frac{n\_j p}{2}} \right\}}. \tag{6.12.2}$$

Let us consider the *<sup>h</sup>*-th moment of *<sup>λ</sup>* for an arbitrary *<sup>h</sup>*. Letting *<sup>c</sup>* <sup>=</sup> *<sup>N</sup> Np* 2 % *k <sup>j</sup>*=<sup>1</sup> *<sup>n</sup> nj p* 2 *j* &*,*

$$\lambda^h = c^h \frac{\left\{ \prod\_{j=1}^k |S\_j|^{\frac{n\_j h}{2}} \right\}}{|S|^{\frac{Nh}{2}}} = c^h \left\{ \prod\_{j=1}^k |S\_j|^{\frac{n\_j h}{2}} \right\} |S|^{-\frac{Nh}{2}}.\tag{6.12.3}$$

The factor causing a difficulty, namely |*S*| <sup>−</sup>*Nh* <sup>2</sup> , will be replaced by an equivalent integral. Letting *Y >O* be a real *p* × *p* positive definite matrix, we have the identity

$$|S|^{-\frac{Nh}{2}} = \frac{1}{\Gamma\_p(\frac{Nh}{2})} \int\_{Y>0} |Y|^{\frac{Nh}{2} - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(YS)} dY,\ \Re(\frac{Nh}{2}) > \frac{p-1}{2},\ S > O,\tag{6.12.4}$$

where

$$\text{tr}(YS) = \text{tr}(YS\_1) + \dots + \text{tr}(YS\_k). \tag{iii}$$

Thus, once (6.12.4) is substituted in (6.12.3), *<sup>λ</sup><sup>h</sup>* splits into products involving *Sj , j* <sup>=</sup> 1*,...,k*; this enables one to integrate out over the densities of *Sj* , which are Wishart densities with *mj* = *nj* − 1 degrees of freedom. Noting that the exponent involving *Sj* is −1 <sup>2</sup> tr*(Σ*−<sup>1</sup> *<sup>j</sup> Sj )* <sup>−</sup> tr*(Y Sj )* = −<sup>1</sup> <sup>2</sup> tr[*Sj (Σ*−<sup>1</sup> *<sup>j</sup>* + 2*Y )*], the integral over the Wishart density of *Sj* gives the following:

$$\begin{split} &\prod\_{j=1}^{k} \frac{1}{2^{\frac{m\_j}{2}} \Gamma\_p(\frac{m\_j}{2}) |\boldsymbol{\Sigma}\_j|^{\frac{m\_j}{2}}} \int\_{S\_j > O} |\boldsymbol{S}\_j|^{\frac{m\_j}{2} + \frac{n\_j h}{2} - \frac{p+1}{2}} \mathrm{e}^{-\frac{1}{2} \mathrm{tr}[S\_j(\boldsymbol{\Sigma}^{-1} + 2\boldsymbol{Y})]} \mathrm{d}S\_j \\ &= \prod\_{j=1}^{k} \frac{2^{\frac{n\_j h}{2}} \Gamma\_p(\frac{m\_j}{2} + \frac{n\_j h}{2}) |\boldsymbol{\Sigma}\_j^{-1} + 2\boldsymbol{Y}|^{-(\frac{m\_j}{2} + \frac{n\_j h}{2})}}{\Gamma\_p(\frac{m\_j}{2}) |\boldsymbol{\Sigma}\_j|^{\frac{m\_j}{2}}} \\ &= 2^{\frac{N\_{\rm ph}}{2}} \prod\_{j=1}^{k} \frac{|\boldsymbol{\Sigma}\_j|^{\frac{n\_j h}{2}} |\boldsymbol{I} + 2\boldsymbol{\Sigma}\_j \boldsymbol{Y}|^{-(\frac{m\_j}{2} + \frac{n\_j h}{2})} \Gamma\_p(\frac{m\_j}{2} + \frac{n\_j h}{2})}{\Gamma\_p(\frac{m\_j}{2})}. \tag{\text{div}} \end{split}$$

Thus, on substituting *(iv)* in *<sup>E</sup>*[*λh*], we have

$$\begin{split} E[\lambda^{h}] &= \frac{c^{h} 2^{\frac{Nph}{2}}}{\Gamma\_{p}(\frac{Nh}{2})} \int\_{Y>0} |Y|^{\frac{Nh}{2} - \frac{P+1}{2}} \\ &\times \left\{ \prod\_{j=1}^{k} \frac{|\Sigma\_{j}|^{\frac{n\_{j}^{h}}{2}} |I + 2\Sigma\_{j}Y|^{-(\frac{m\_{j}^{i}}{2} + \frac{n\_{j}h}{2})} \Gamma\_{p}(\frac{m\_{j}}{2} + \frac{n\_{j}h}{2})}{\Gamma\_{p}(\frac{m\_{j}}{2})} \right\} \mathrm{d}Y, \end{split} \tag{6.12.5}$$

which is the non-null *h*-th moment of *λ*. The *h*-th null moment is available when *Σ*<sup>1</sup> = ···= *Σk* = *Σ*. In the null case,

$$\prod\_{j=1}^{k} |I + 2Y\Sigma\_j|^{-(\frac{m\_j}{2} + \frac{n\_j h}{2})} |\Sigma\_j|^{\frac{n\_j h}{2}} \frac{\Gamma\_p(\frac{m\_j}{2} + \frac{n\_j h}{2})}{\Gamma\_p(\frac{m\_j}{2})}$$

$$= |\Sigma|^{\frac{Nh}{2}} |I + 2Y\Sigma|^{-(\frac{N-k}{2} + \frac{Nh}{2})} \Big{[} \prod\_{j=1}^{k} \frac{\Gamma\_p(\frac{m\_j}{2} + \frac{n\_j h}{2})}{\Gamma\_p(\frac{m\_j}{2})} \Big]. \tag{v}$$

Then, substituting *(v)* in (6.12.5) and integrating out over *Y* produces

$$E[\lambda^h|H\_o] = c^h \frac{\Gamma\_p(\frac{N-k}{2})}{\Gamma\_p(\frac{N-k}{2} + \frac{Nh}{2})} \left\{ \prod\_{j=1}^k \frac{\Gamma\_p(\frac{n\_j-1}{2} + \frac{n\_jh}{2})}{\Gamma\_p(\frac{n\_j-1}{2})} \right\}.\tag{6.12.6}$$

Observe that when *<sup>h</sup>* <sup>=</sup> 0, *<sup>E</sup>*[*λh*|*Ho*] = 1. For *<sup>h</sup>* <sup>=</sup> *<sup>s</sup>* <sup>−</sup> 1 where *<sup>s</sup>* is a complex parameter, we have the Mellin transform of the density of *λ*, denoted by *f (λ)*, which can be expressed as follows in terms of an H-function:

$$f(\lambda) = \frac{1}{c} \frac{\Gamma\_p(\frac{N-k}{2})}{\left\{ \prod\_{j=1}^k \Gamma\_p(\frac{n\_j-1}{2}) \right\}} H\_{1,k}^{k,0} \left[ \frac{\lambda}{c} \Big| \binom{-\frac{k}{2}, \frac{N}{2}}{(-\frac{1}{2}, \frac{n\_j}{2}), \, j=1,...,k} \right], \ 0 < \lambda < 1,\tag{6.12.7}$$

and zero elsewhere, where the H-function is defined in Sect. 5.4.3, more details being available from Mathai and Saxena (1978) and Mathai et al. (2010). Since the coefficients of *<sup>h</sup>* <sup>2</sup> in the gammas, that is, *n*1*,...,nk* and *N*, are all positive integers, one can expand all gammas by using the multiplication formula for gamma functions, and then, *f (λ)* can be expressed in terms of a G-function as well. It may be noted from (6.12.5) that for obtaining the non-null moments, and thereby the non-null density, one has to integrate out *Y* in (6.12.5). This has not yet been worked out for a general *k*. For *k* = 2, one can obtain a series form in terms of zonal polynomials for the integral in (6.12.5). The rather intricate derivations are omitted.

#### **6.12.1. Asymptotic behavior**

We now investigate the asymptotic behavior of −2 ln *λ* as *nj* → ∞*, j* = 1*,...,k*, *N* = *n*<sup>1</sup> +··· + *nk*. On expanding the real matrix-variate gamma functions in the the gamma ratio involving *h* in (6.12.6), we have

$$\frac{\prod\_{j=1}^{k} \Gamma\_p(\frac{n\_j}{2}(1+h) - \frac{1}{2})}{\Gamma\_p(\frac{N}{2}(1+h) - \frac{k}{2})} \to \frac{\prod\_{j=1}^{k} \left\{ \prod\_{i=1}^{p} \Gamma(\frac{n\_j}{2}(1+h) - \frac{1}{2} - \frac{i-1}{2}) \right\}}{\prod\_{i=1}^{p} \Gamma(\frac{N}{2}(1+h) - \frac{k}{2} - \frac{i-1}{2})}, \qquad (i)$$

excluding the factor containing *π*. Letting *nj* <sup>2</sup> *(*1+*h)* → ∞*, j* <sup>=</sup> <sup>1</sup>*,... , k,* and *<sup>N</sup>* <sup>2</sup> *(*1+*h)* → ∞ as *nj* → ∞*, j* = 1*,... , k,* with *N* → ∞, we now express all the gamma functions in *(i)* in terms of Sterling's asymptotic formula. For the numerator, we have

$$\begin{split} &\prod\_{j=1}^{k} \left\{ \prod\_{i=1}^{p} \Gamma\left(\frac{n\_j}{2}(1+h) - \frac{1}{2} - \frac{i-1}{2}\right) \right\} \\ &\to \prod\_{j=1}^{k} \left\{ \prod\_{i=1}^{p} \sqrt{(2\pi)} \Big[ \frac{n\_j}{2}(1+h) \right]^{\frac{n\_j}{2}(1+h) - \frac{1}{2} - \frac{i}{2}} e^{-\frac{n\_j}{2}(1+h)} \right\} \\ &= \prod\_{j=1}^{k} (\sqrt{2\pi})^{p} \Big[ \frac{n\_j}{2}(1+h) \Big]^{\frac{n\_j}{2}(1+h)p - \frac{p}{2} - \frac{p(p+1)}{4}} e^{-\frac{n\_j}{2}(1+h)p} \\ &= (\sqrt{2\pi})^{kp} \Big[ \prod\_{j=1}^{k} \left(\frac{n\_j}{2}\right)^{\frac{n\_j}{2}(1+h)p - \frac{p}{2} - \frac{p(p+1)}{4}} \Big] e^{-\frac{N}{2}(1+h)p} \\ &\qquad \times (1+h)^{\frac{N}{2}(1+h) - \frac{kp}{2} - k\frac{p(p+1)}{4}}, \qquad \text{(ii)} \end{split}$$

and the denominator in *(i)* has the following asymptotic representation:

$$\begin{split} \prod\_{i=1}^{p} \Gamma\left(\frac{N}{2}(1+h) - \frac{k}{2} - \frac{i-1}{2}\right) &\to \prod\_{i=1}^{p} (\sqrt{2\pi}) \left[\frac{N}{2}(1+h)\right]^{\frac{N}{2}(1+h) - \frac{k}{2} - \frac{i}{2}} e^{-\frac{N}{2}(1+h)} \\ &= (\sqrt{2\pi})^{p} \left[\frac{N}{2}\right]^{\frac{N}{2}(1+h)p - \frac{kp}{2} - \frac{p(p+1)}{4}} e^{-\frac{N}{2}(1+h)p} \\ &\qquad \times (1+h)^{\frac{N}{2}(1+h)p - \frac{kp}{2} - \frac{p(p+1)}{4}}.\end{split}$$

Now, expanding the gammas in the constant part *Γp(<sup>N</sup>*−*<sup>k</sup>* <sup>2</sup> *)/ k <sup>j</sup>*=<sup>1</sup> *Γp( nj*−1 <sup>2</sup> *)* and then taking care of *ch*, we see that the factors containing *π*, the *nj* 's and *N* disappear leaving

$$(1+h)^{-(k-1)\frac{p(p+1)}{4}}.$$

Hence <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *(k*−1*) p(p*+1*)* 2 and hence we have the following result:

**Theorem 6.12.1.** *Consider the λ-criterion in (6.12.2) or the null density in (6.12.7). When nj* → ∞*, j* = 1*,...,k, the asymptotic null density of* −2 ln *λ is a real scalar chisquare with (k* <sup>−</sup> <sup>1</sup>*)p(p*+1*)* <sup>2</sup> *degrees of freedom.*

Observe that the number of parameters restricted by the null hypothesis *Ho* : *Σ*<sup>1</sup> = ··· = *Σk* = *Σ* where *Σ* is unknown, is *(k* − 1*)* times the number of distinct parameters in *Σ*, which is *p(p*+1*)* <sup>2</sup> , which coincides with the number of degrees of freedom of the asymptotic chisquare distribution under *Ho*.

### **6.13. Testing the Hypothesis that** *k* **Independent** *p***-variate Real Gaussian Populations are Identical and Multivariate Analysis of Variance**

Consider *<sup>k</sup>* independent *<sup>p</sup>*-variate real Gaussian populations *Xij* <sup>∼</sup> *Np(μ(i), Σi), Σi > O, i* = 1*,... , k,* and *j* = 1*,...,ni,* where the *p* × 1 vector *Xij* is the *j* -th sample value belonging to the *i*-th population, these samples (iid variables) being of sizes *n*1*,...,nk* from these *k* populations. The joint density of all the sample values, denoted by *L*, can be expressed as follows:

$$\begin{split} L &= \prod\_{i=1}^{k} L\_i, \quad L\_i = \prod\_{j=1}^{n\_i} \frac{\mathbf{e}^{-\frac{1}{2}\sum\_{j=1}^{n\_i} (X\_{ij} - \mu^{(i)})' \Sigma\_i^{-1} (X\_{ij} - \mu^{(i)})}{(2\pi)^{\frac{n\_i p}{2}} |\Sigma\_i|^{\frac{n\_i}{2}}} \\ &= \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma\_i^{-1} S\_i) - \frac{n\_i}{2} (\bar{X}\_i - \mu^{(i)})' \Sigma\_i^{-1} (\bar{X}\_i - \mu^{(i)})}{(2\pi)^{\frac{n\_i}{2}} |\Sigma\_i|^{\frac{n\_i}{2}}} \end{split}$$

#### Arak M. Mathai, Serge B. Provost, Hans J. Haubold

where *<sup>X</sup>*¯*<sup>i</sup>* <sup>=</sup> <sup>1</sup> *ni (Xi*<sup>1</sup> +···+*Xini), i* <sup>=</sup> <sup>1</sup>*,... , k,* and *<sup>E</sup>*[*Xij* ] = *μ(i), j* <sup>=</sup> <sup>1</sup>*,...,ni*. Then, letting *N* = *n*<sup>1</sup> +···+ *nk,*

$$\begin{split} \max\_{\Omega} L &= \prod\_{i=1}^{k} \max\_{\Omega} L\_{i} = \prod\_{i=1}^{k} \frac{n\_{i}^{\frac{n\_{i}p}{2}} \mathbf{e}^{-\frac{n\_{i}p}{2}}}{(2\pi)^{\frac{n\_{i}p}{2}} |S\_{i}|^{\frac{n\_{i}}{2}}} \\ &= \frac{\{\prod\_{i=1}^{k} n\_{i}^{\frac{n\_{i}p}{2}}\} \mathbf{e}^{-\frac{Np}{2}}}{(2\pi)^{\frac{Np}{2}} \{\prod\_{i=1}^{k} |S\_{i}|^{\frac{n\_{i}}{2}}\}}. \end{split}$$

Consider the hypothesis *Ho* : *μ(*1*)* = ··· = *μ(k)* <sup>=</sup> *μ, Σ*<sup>1</sup> = ··· = *Σk* <sup>=</sup> *Σ,* where *<sup>μ</sup>* and *Σ* are unknown. This corresponds to the hypothesis of equality of these *k* populations. Under *Ho*, the maximum likelihood estimator (MLE) of *μ*, denoted by *μ*ˆ, is given by *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>N</sup>* [*n*1*X*¯ <sup>1</sup> +···+ *nkX*¯ *<sup>k</sup>*] where *N* and *X*¯*<sup>i</sup>* are as defined above. As for the common *Σ*, its MLE is

$$\begin{aligned} \hat{\Sigma} &= \frac{1}{N} \sum\_{i=1}^{k} \sum\_{j=1}^{n\_i} (X\_{ij} - \hat{\mu})(X\_{ij} - \hat{\mu})' \\ &= \frac{1}{N} [S\_1 + \dots + S\_k + \sum\_{i=1}^{k} n\_i (\bar{X}\_i - \hat{\mu})(\bar{X}\_i - \hat{\mu})'] \end{aligned}$$

where *Si* is the sample sum of products matrix for the *i*-th sample, observing that

$$\begin{split} \sum\_{j=1}^{n\_i} (X\_{ij} - \hat{\mu})(X\_{ij} - \hat{\mu})' &= \sum\_{j=1}^{n\_i} (X\_{ij} - \bar{X}\_i + \bar{X}\_i - \hat{\mu})(X\_{ij} - \bar{X}\_i + \bar{X}\_i - \hat{\mu})' \\ &= \sum\_{j=1}^{n\_i} (X\_{ij} - \bar{X}\_i)(X\_{ij} - \bar{X}\_i)' + \sum\_{j=1}^{n\_i} (\bar{X}\_i - \hat{\mu})(\bar{X}\_i - \hat{\mu})' \\ &= S\_i + n\_i(\bar{X}\_i - \hat{\mu})(\bar{X}\_i - \hat{\mu})'. \end{split}$$

Hence the maximum of the likelihood function under *Ho* is the following:

$$\max\_{H\_o} L = \frac{\mathbf{e}^{-\frac{Np}{2}} N^{\frac{Np}{2}}}{(2\pi)^{\frac{Np}{2}} |S + \sum\_{i=1}^k n\_i (\bar{X}\_i - \hat{\mu})(\bar{X}\_i - \hat{\mu})'|^{\frac{N}{2}}}$$

where *S* = *S*<sup>1</sup> +···+ *Sk*. Therefore the *λ*-criterion is given by

$$\lambda = \frac{\max\_{H\_0}}{\max\_{\Omega}} = \frac{\{\prod\_{i=1}^k |S\_i|^{\frac{n\_i}{2}}\} N^{\frac{Np}{2}}}{\{\prod\_{i=1}^k n\_i^{\frac{n\_i p}{2}}\} |S + \sum\_{i=1}^k n\_i (\bar{X}\_i - \hat{\mu})(\bar{X}\_i - \hat{\mu})'|^{\frac{N}{2}}}. \tag{6.13.1}$$

#### **6.13.1. Conditional and marginal hypotheses**

For convenience, we may split *λ* into the product *λ*1*λ*<sup>2</sup> where *λ*<sup>1</sup> is the *λ*-criterion for the conditional hypothesis *Ho*<sup>1</sup> : *μ(*1*)* =··· = *μ(k)* <sup>=</sup> *<sup>μ</sup>* given that *<sup>Σ</sup>*<sup>1</sup> =··· = *Σk* <sup>=</sup> *<sup>Σ</sup>* and *λ*<sup>2</sup> is the *λ*-criterion for the marginal hypothesis *Ho*<sup>2</sup> : *Σ*<sup>1</sup> = ··· = *Σk* = *Σ* where *μ* and *Σ* are unknown. The conditional hypothesis *Ho*<sup>1</sup> is actually the null hypothesis usually being made when establishing the multivariate analysis of variance (MANOVA) procedure. We will only consider *Ho*<sup>1</sup> since the marginal hypothesis *Ho*<sup>2</sup> has already been discussed in Sect. 6.12. When the *Σi*'s are assumed to be equal, the common *Σ* is estimated by the MLE <sup>1</sup> *<sup>N</sup> (S*<sup>1</sup> +···+ *Sk)* where *Si* is the sample sum of products matrix in the *i*-th population. The common *μ* is estimated by <sup>1</sup> *<sup>N</sup> (n*1*X*¯ <sup>1</sup> +···+ *nkX*¯ *k)*. Accordingly, the *λ*-criterion for this conditional hypothesis is the following:

$$\lambda\_1 = \frac{|S|^{\frac{N}{2}}}{|S + \sum\_{i=1}^{k} n\_i(\bar{X}\_i - \hat{\mu})(\bar{X}\_i - \hat{\mu})'|^{\frac{N}{2}}} \tag{6.13.2}$$

where *<sup>S</sup>* <sup>=</sup> *<sup>S</sup>*<sup>1</sup> +··· + *Sk, <sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>N</sup> (n*1*X*¯ <sup>1</sup> +··· + *nkX*¯ *k), N* = *n*<sup>1</sup> +··· + *nk*. Note that the *Si*'s are independently Wishart distributed with *ni* − 1 degrees of freedom, that is, *Si ind* ∼ *Wp(ni* − 1*, Σ)*, *i* = 1*,...,k*, and hence *S* ∼ *Wp(N* − *k, Σ)*. Let

$$\mathcal{Q} = \sum\_{i=1}^{k} n\_i (\bar{X}\_i - \hat{\mu}) (\bar{X}\_i - \hat{\mu})'.$$

Since *Q* only contains sample averages and the sample averages and the sample sum of products matrices are independently distributed, *Q* and *S* are independently distributed. Moreover, since we can write *X*¯*<sup>i</sup>* − ˆ*μ* as *(X*¯*<sup>i</sup>* − *μ)* − *(μ*ˆ − *μ)*, where *μ* is the common true mean value vector, without any loss of generality we can deem the *X*¯*i*'s to be independently *Np(O,* <sup>1</sup> *ni Σ)* distributed, *<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,...,k*, and letting *Yi* <sup>=</sup> <sup>√</sup>*niX*¯*i*, one has *Yi iid* ∼ *Np(O, Σ)* under the hypothesis *Ho*1. Now, observe that

$$\begin{split} \bar{X}\_{l} - \hat{\mu} &= \bar{X}\_{l} - \frac{1}{N} (n\_{1}\bar{X}\_{1} + \cdots + n\_{k}\bar{X}\_{k}) \\ &= -\frac{n\_{1}}{N} \bar{X}\_{1} - \cdots - \frac{n\_{l-1}}{N} \bar{X}\_{l-1} + \left(1 - \frac{n\_{l}}{N}\right) \bar{X}\_{l} - \frac{n\_{l+1}}{N} \bar{X}\_{l+1} - \cdots - \frac{n\_{k}}{N} \bar{X}\_{k}, \quad (i) \end{split}$$

$$\begin{split} \mathcal{Q} &= \sum\_{i=1}^{k} n\_i (\bar{X}\_i - \hat{\mu})(\bar{X}\_i - \hat{\mu})' \\ &= \sum\_{i=1}^{k} n\_i \bar{X}\_i \bar{X}\_i' - N\hat{\mu}\hat{\mu}' \\ &= \sum\_{i=1}^{k} Y\_i Y\_i' - \frac{1}{N} (\sqrt{n\_1} Y\_1 + \dots + \sqrt{n\_k} Y\_k)(\sqrt{n\_1} Y\_1 + \dots + \sqrt{n\_k} Y\_k)', \end{split} \tag{ii}$$

where <sup>√</sup>*n*1*Y*<sup>1</sup> +···+ <sup>√</sup>*nkYk* <sup>=</sup> *(Y*1*,...,Yk)DJ* with *<sup>J</sup>* being the *<sup>k</sup>* <sup>×</sup> 1 vector of unities, *J* - = *(*1*,...,* 1*)*, and *D* = diag*(* <sup>√</sup>*n*1*,...,* <sup>√</sup>*nk)*. Thus, we can express *<sup>Q</sup>* as follows:

$$\mathcal{Q} = (Y\_1, \dots, Y\_k)[I - \frac{1}{N}DJJ'D](Y\_1, \dots, Y\_k)'. \tag{iii}$$

Let *<sup>B</sup>* <sup>=</sup> <sup>1</sup> *<sup>N</sup> DJJ* - *D* and *A* = *I* − *B*. Then, observing that *J* - *<sup>D</sup>*2*<sup>J</sup>* <sup>=</sup> *<sup>N</sup>*, both *<sup>B</sup>* and *A* are idempotent matrices, where *B* is of rank 1 since the trace of *B* or equivalently the trace of <sup>1</sup> *<sup>N</sup> J* - *D*2*J* is equal to one, so that the trace of *A* which is also its rank, is *k* − 1. Then, there exists an orthonormal matrix *P*, *P P*- = *Ik, P*- *P* = *Ik*, such that *A* = *P*- *Ik*−<sup>1</sup> *O O*- 0 *P* where *O* is a *(k* −1*)*×1 null vector, *O* being its transpose. Letting *(U*1*,...,Uk)* = *(Y*1*,...,Yk)P*- , the *Ui*'s are still independently *Np(O, Σ)* distributed under *Ho*1, so that

$$\mathcal{Q} = (U\_1, \dots, U\_k) \begin{bmatrix} I\_{k-1} & O \\ O' & 0 \end{bmatrix} (U\_1, \dots, U\_k)' = (U\_1, \dots, U\_{k-1}, O)(U\_1, \dots, U\_{k-1}, O)'$$

$$= \sum\_{i=1}^{k-1} U\_i U\_i' \sim W\_p(k-1, \,\, \Sigma). \tag{iv}$$

Thus, *<sup>S</sup>*+*<sup>k</sup> <sup>i</sup>*=<sup>1</sup> *ni(X*¯*i*− ˆ*μ)(X*¯*<sup>i</sup>* − ˆ*μ)*- ∼ *Wp(N* −1*, Σ)*, which clearly is not independently distributed of *S*, referring to the ratio in (6.13.2).

### **6.13.2. Arbitrary moments of** *λ*<sup>1</sup>

Given (6.13.2), we have

$$\lambda\_1 = \frac{|S|^{\frac{N}{2}}}{|S+\mathcal{Q}|^{\frac{N}{2}}} \Rightarrow \lambda\_1^h = |S|^{\frac{Nh}{2}} |S+\mathcal{Q}|^{-\frac{Nh}{2}} \tag{v}$$

#### Hypothesis Testing and Null Distributions

where |*S* + *Q*| <sup>−</sup>*Nh* <sup>2</sup> will be replaced by the equivalent integral

$$\left|S + \mathcal{Q}\right|^{-\frac{Nh}{2}} = \frac{1}{\Gamma\_p(\frac{Nh}{2})} \int\_{T > O} \left|T\right|^{\frac{Nh}{2} - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}((S+\mathcal{Q})T)} \mathrm{d}T \tag{\text{v}\mathbf{i}}$$

with the *p* × *p* matrix *T >O*. Hence, the *h*-th moment of *λ*1, for arbitrary *h*, is the following expected value:

$$E[\lambda\_1^h | H\_{o1}] = E\{ |\mathbf{S}|^{\frac{Nh}{2}} | \mathbf{S} + \mathbf{Q}|^{-\frac{Nh}{2}} \}. \tag{\text{vii}}$$

We now evaluate *(vii)* by integrating out over the Wishart density of *S* and over the joint multinormal density for *U*1*,...,Uk*−1:

$$\begin{split} E[\lambda\_1^h | H\_{o1}] &= \frac{1}{\Gamma\_p(\frac{Nh}{2})} \int\_{T>O} |T|^{\frac{Nh}{2} - \frac{p+1}{2}} \\ &\times \frac{\int\_{S>O} |\mathbf{S}|^{\frac{Nh}{2} + \frac{N-k}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\Sigma^{-1} \mathbf{S}) - \text{tr}(ST)}}{2^{\frac{(N-k)p}{2}} \Gamma\_p(\frac{N-k}{2}) |\Sigma|^{\frac{N-k}{2}}} \text{d}S \\ &\times \int\_{U\_1, \dots, U\_{k-1}} \frac{\mathbf{e}^{-\frac{1}{2}} \sum\_{i=1}^{k-1} U\_i^i \Sigma^{-1} U\_i - \sum\_{i=1}^{k-1} \text{tr}(TU\_i U\_i')}{(2\pi)^{\frac{(k-1)p}{2}} |\Sigma|^{\frac{k-1}{2}}} \text{d}U\_1 \wedge \dots \wedge \mathbf{d} U\_{k-1} \wedge \text{d}T. \end{split}$$

The integral over *S* is evaluated as follows:

$$\frac{\int\_{S>O} |S|^{\frac{Nh}{2} + \frac{N-k}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\Sigma^{-1}(S+2TS))}}{2^{\frac{(N-k)p}{2}} \Gamma\_p(\frac{N-k}{2}) |\Sigma|^{\frac{N-k}{2}}} \text{d}S$$

$$= 2^{\frac{Nhp}{2}} |\Sigma|^{\frac{Nh}{2}} \frac{\Gamma\_p(\frac{N-k}{2} + \frac{Nh}{2})}{\Gamma\_p(\frac{N-k}{2})} |I + 2\mathcal{E}T|^{-(\frac{Nh}{2} + \frac{N-k}{2})} \tag{viiii}$$

for *<sup>I</sup>* <sup>+</sup>2*ΣT > O*, *(N*−*<sup>k</sup>* <sup>2</sup> <sup>+</sup> *Nh* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . The integral over *U*1*,...,Uk*−<sup>1</sup> is the following, denoted by *δ*:

$$\delta = \int\_{U\_1, \dots, U\_{k-1}} \frac{\mathbf{e}^{-\frac{1}{2} \sum\_{i=1}^{k-1} U\_i' \Sigma^{-1} U\_i - \text{tr}(\sum\_{i=1}^{k-1} T U\_i U\_i')}}{(2\pi)^{\frac{(k-1)p}{2}} |\Sigma|^{\frac{k-1}{2}}} \mathbf{d}U\_1 \wedge \dots \wedge \mathbf{d}U\_{k-1}$$

where

$$\text{tr}\left(\sum\_{i=1}^{k-1} TU\_i U\_i'\right) = \text{tr}\left(\sum\_{i=1}^{k-1} U\_i'TU\_i\right) = \sum\_{i=1}^{k-1} U\_i'TU\_i$$

since *U*- *<sup>i</sup>T Ui* is scalar; thus, the exponent becomes <sup>−</sup><sup>1</sup> 2 *k*−<sup>1</sup> *<sup>i</sup>*=<sup>1</sup> *<sup>U</sup>*- *<sup>i</sup>*[*Σ*−<sup>1</sup> <sup>+</sup> <sup>2</sup>*<sup>T</sup>* ]*Ui* and the integral simplifies to

$$\delta = |I + 2\Sigma T|^{-\frac{k-1}{2}}, \; I + 2\Sigma T > O. \tag{ix}$$

Now the integral over *T* is the following:

$$\begin{split} &\frac{1}{\Gamma\_{p}(\frac{Nh}{2})} \int\_{T>O} |T|^{\frac{Nh}{2}-\frac{p+1}{2}} |I+2\Sigma T|^{-(\frac{Nh}{2}+\frac{N-k}{2}+\frac{k-1}{2})} \\ &= \frac{\Gamma\_{p}(\frac{N-k}{2}+\frac{k-1}{2})}{\Gamma\_{p}(\frac{Nh}{2}+\frac{N-k}{2}+\frac{k-1}{2})} 2^{-\frac{Nhp}{2}} |\Sigma|^{-\frac{Nh}{2}}, \ \Re(\frac{N-k}{2}+\frac{Nh}{2}) > \frac{p-1}{2}. \end{split}$$

Therefore,

$$E[\lambda\_1^h|H\_{o1}] = \frac{\Gamma\_p(\frac{N-k}{2} + \frac{Nh}{2})}{\Gamma\_p(\frac{N-k}{2})} \frac{\Gamma\_p(\frac{N-k}{2} + \frac{k-1}{2})}{\Gamma\_p(\frac{Nh}{2} + \frac{N-k}{2} + \frac{k-1}{2})} \tag{6.13.3}$$

for *(N*−*<sup>k</sup>* <sup>2</sup> <sup>+</sup> *Nh* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> 2 .

# **6.13.3. The asymptotic distribution of** −2 ln *λ*<sup>1</sup>

An asymptotic distribution of −2 ln *λ*<sup>1</sup> as *N* → ∞ can be derived from (6.13.3). First, on expanding the real matrix-variate gamma functions in (6.13.3), we obtain the following representation of the *h*-th null moment of *λ*1:

$$\begin{split} E[\lambda\_1^h | H\_{o1}] &= \left\{ \prod\_{j=1}^p \frac{\Gamma(\frac{N-k}{2} + \frac{k-1}{2} - \frac{j-1}{2})}{\Gamma(\frac{N-k}{2} - \frac{j-1}{2})} \right\} \\ &\times \left\{ \prod\_{j=1}^p \frac{\Gamma(\frac{N}{2}(1+h) - \frac{k}{2} - \frac{j-1}{2})}{\Gamma(\frac{N}{2}(1+h) - \frac{k}{2} + \frac{k-1}{2} - \frac{j-1}{2})} \right\}. \end{split} \tag{x1}$$

Let us now express all the gamma functions in terms of Sterling's asymptotic formula by taking *<sup>N</sup>* <sup>2</sup> → ∞ in the constant part and *<sup>N</sup>* <sup>2</sup> *(*1 + *h)* → ∞ in the part containing *h*. Then,

$$\frac{\Gamma\left(\frac{N}{2}(1+h)-\frac{k}{2}-\frac{j-1}{2}\right)}{\Gamma\left(\frac{N}{2}(1+h)-\frac{k}{2}+\frac{k-1}{2}-\frac{j-1}{2}\right)} \to \frac{(2\pi)^{\frac{1}{2}}}{(2\pi)^{\frac{1}{2}}} \frac{[\frac{N}{2}(1+h)]^{\frac{N}{2}(1+h)-\frac{k}{2}-\frac{j}{2}}}{[\frac{N}{2}(1+h)]^{\frac{N}{2}(1+h)-\frac{k}{2}+\frac{k-1}{2}-\frac{j}{2}}} \frac{\mathbf{e}^{\frac{N}{2}(1+h)}}{\mathbf{e}^{\frac{N}{2}(1+h)}}$$

$$=\frac{1}{[\frac{N}{2}(1+h)]^{\frac{k-1}{2}}},\tag{xi1}$$

$$\frac{\Gamma(\frac{N}{2} - \frac{k}{2} + \frac{k-1}{2} - \frac{j-1}{2})}{\Gamma(\frac{N}{2} - \frac{k}{2} - \frac{j-1}{2})} \to \left(\frac{N}{2}\right)^{\frac{(k-1)}{2}}.\tag{xiii}$$

Hypothesis Testing and Null Distributions

Hence,

$$E[\lambda\_1^h | H\_{o1}] \to (1+h)^{-\frac{(k-1)p}{2}} \text{ as } N \to \infty. \tag{6.13.4}$$

Thus, the following result:

**Theorem 6.13.1.** *For the test statistic <sup>λ</sup>*<sup>1</sup> *given in (6.13.2),* <sup>−</sup>2 ln *<sup>λ</sup>*<sup>1</sup> <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *(k*−1*)p, that is,* −2 ln *λ*<sup>1</sup> *tends to a real scalar chisquare variable having (k* − 1*)p degrees of freedom as N* → ∞ *with N* = *n*<sup>1</sup> +···+ *nk, nj being the sample size of the j -th p-variate real Gaussian population.*

Under the marginal hypothesis *Ho*<sup>2</sup> : *Σ*<sup>1</sup> = ··· = *Σk* = *Σ* where *Σ* is unknown, the *λ*-criterion is denoted by *λ*2, and its *h*-th moment, which is available from (6.12.6) of Sect. 6.12, is given by

$$E[\lambda\_2^h|H\_{o2}] = c^h \frac{\Gamma\_p(\frac{N-k}{2})}{\Gamma\_p(\frac{N-k}{2} + \frac{Nh}{2})} \left\{ \prod\_{j=1}^p \frac{\Gamma\_p(\frac{n\_j-1}{2} + \frac{n\_jh}{2})}{\Gamma\_p(\frac{n\_j-1}{2})} \right\} \tag{6.13.5}$$

for *( nj*−1 <sup>2</sup> <sup>+</sup> *nj <sup>h</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,...,k*. Hence the *h*-th null moment of the *λ* criterion for testing the hypothesis *Ho* of equality of the *k* independent *p*-variate real Gaussian populations is the following:

$$\begin{split} E[\boldsymbol{\lambda}^{\hbar}|\boldsymbol{H}\_{o}] &= E[\boldsymbol{\lambda}^{\hbar}\_{1}|\boldsymbol{H}\_{o1}]E[\boldsymbol{\lambda}^{\hbar}\_{2}|\boldsymbol{H}\_{o2}] \\ &= \frac{\Gamma\_{p}(\frac{N-k}{2} + \frac{k-1}{2})}{\Gamma\_{p}(\frac{N\hbar}{2} + \frac{N-k}{2} + \frac{k-1}{2})} c^{\hbar} \Big\{ \prod\_{j=1}^{p} \frac{\Gamma\_{p}(\frac{n\_{j}-1}{2} + \frac{n\_{j}\hbar}{2})}{\Gamma\_{p}(\frac{n\_{j}-1}{2})} \Big\} \end{split} \tag{6.13.6}$$

for *( nj*−1 <sup>2</sup> <sup>+</sup> *nj <sup>h</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,... , k, N* = *n*<sup>1</sup> +···+ *nk*, where *c* is the constant associated with the *h*-th moment of *λ*2. Combining Theorems 6.13.1 and Theorem 6.12.1, the asymptotic distribution of −2 ln *λ* of (6.13.6) is a real scalar chisquare with *(k* −1*)p* + *(k* <sup>−</sup> <sup>1</sup>*)p(p*+1*)* <sup>2</sup> degrees of freedom. Thus, the following result:

**Theorem 6.13.2.** *For the λ-criterion for testing the hypothesis of equality of k independent <sup>p</sup>-variate real Gaussian populations,* <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>ν</sup> , ν* <sup>=</sup> *(k* <sup>−</sup> <sup>1</sup>*)p* <sup>+</sup> *(k* <sup>−</sup> <sup>1</sup>*)p(p*+1*)* 2 *as nj* → ∞*, j* = 1*,...,k.*

**Note 6.13.1.** Observe that for the conditional hypothesis *Ho*<sup>1</sup> in (6.13.2), the degrees of freedom of the asymptotic chisquare distribution of −2 ln *λ*<sup>1</sup> is *(k* − 1*)p*, which is also the number of parameters restricted by the hypothesis *Ho*1. For the hypothesis *Ho*2*,* the corresponding degrees of freedom of the asymptotic chisquare distribution of −2 ln *λ*<sup>2</sup> is *(k* <sup>−</sup>1*)p(p*+1*)* <sup>2</sup> , which as well is the number of parameters restricted by the hypothesis *Ho*2. The asymptotic chisquare distribution of −2 ln *λ* for the hypothesis *Ho* of equality of *k* independent *p*-variate Gaussian populations the degrees of freedom is the sum of these two quantities, that is, *(k* <sup>−</sup> <sup>1</sup>*)p* <sup>+</sup> *(k* <sup>−</sup> <sup>1</sup>*)p(p*+1*)* <sup>2</sup> <sup>=</sup> *p(k* <sup>−</sup> <sup>1</sup>*)(p*+3*)* <sup>2</sup> , which also coincides with the number of parameters restricted under the hypothesis *Ho*.

#### **Exercises**

**6.1.** Derive the *λ*-criteria for the following tests in a real univariate Gaussian population *N*1*(μ, σ*2*)*, assuming that a simple random sample of size *n*, namely *x*1*,...,xn,* which are iid as *<sup>N</sup>*1*(μ, σ*2*)*, is available: (1): *<sup>μ</sup>* <sup>=</sup> *μo* (given), *<sup>σ</sup>*<sup>2</sup> is known; (2): *<sup>μ</sup>* <sup>=</sup> *μo*, *<sup>σ</sup>*<sup>2</sup> unknown; (3): *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>o</sup>* (given), also you may refer to Mathai and Haubold (2017).

In all the following problems, it is assumed that a simple random sample of size *n* is available. The alternative hypotheses are the natural alternatives.

**6.2.** Repeat Exercise 6.1 for the corresponding complex Gaussian.

**6.3.** Construct the *λ*-criteria in the complex case for the tests discussed in Sects. 6.2–6.4.

**6.4.** In the real *p*-variate Gaussian case, consider the hypotheses (1): *Σ* is diagonal or the individual components are independently distributed; (2): The diagonal elements are equal, given that *Σ* is diagonal (which is a conditional test). Construct the *λ*-criterion in each case.

**6.5.** Repeat Exercise 6.4 for the complex Gaussian case.

**6.6.** Let the population be real *p*-variate Gaussian *Np(μ, Σ), Σ* = *(σij ) > O, μ*- = *(μ*1*,...,μp)*. Consider the following tests and compute the *λ*-criteria: (1): *σ*<sup>11</sup> = ··· = *σpp* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup>*, σij* <sup>=</sup> *<sup>ν</sup>* for all *<sup>i</sup>* and *<sup>j</sup>* , *<sup>i</sup>* = *j* . That is, all the variances are equal and all the covariances are equal; (2): In addition to (1), *μ*<sup>1</sup> = *μ*<sup>2</sup> = ··· = *μ* or all the mean values are equal. Construct the *λ*-criterion in each case. The first one is known as *Lvc* criterion and the second one is known *Lmvc* criterion. Repeat the same exercise for the complex case. Some distributional aspects are examined in Mathai (1970b) and numerical tables are available in Mathai and Katiyar (1979b).

**6.7.** Let the population be real *p*-variate Gaussian *Np(μ, Σ), Σ > O*. Consider the hypothesis (1):

$$
\Sigma = \begin{bmatrix} a & b & b & \cdots & b \\ b & a & b & \cdots & b \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ b & b & b & \cdots & a \end{bmatrix}, \ a \neq 0, \ b \neq 0, \ a \neq b.
$$

(2):

$$
\Sigma = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \quad \Sigma\_{11} = \begin{bmatrix} a\_1 & b\_1 & \cdots & b\_1 \\ b\_1 & a\_1 & \cdots & b\_1 \\ \vdots & \vdots & \ddots & \vdots \\ b\_1 & b\_1 & \cdots & a\_1 \end{bmatrix}, \quad \Sigma\_{22} = \begin{bmatrix} a\_2 & b\_2 & \cdots & b\_2 \\ b\_2 & a\_2 & \cdots & b\_2 \\ \vdots & \vdots & \ddots & \vdots \\ b\_2 & b\_2 & \cdots & a\_2 \end{bmatrix}
$$

where *a*<sup>1</sup> = 0*, a*<sup>2</sup> = 0*, b*<sup>1</sup> = 0*, b*<sup>2</sup> = 0*, a*<sup>1</sup> = *b*1*, a*<sup>2</sup> = *b*2, *Σ*<sup>12</sup> = *Σ*- <sup>21</sup> and all the elements in *Σ*<sup>12</sup> and *Σ*<sup>21</sup> are each equal to *c* = 0. Construct the *λ*-criterion in each case. (These are hypotheses on patterned matrices).

**6.8.** Repeat Exercise 6.7 for the complex case.

**6.9.** Consider *k* independent real *p*-variate Gaussian populations with different parameters, distributed as *Np(Mj , Σj ), Σj > O, M*- *<sup>j</sup>* = *(μ*1*<sup>j</sup> ,...,μpj ), j* = 1*,...,k*. Construct the *λ*-criterion for testing the hypothesis *Σ*<sup>1</sup> =···= *Σk* or the covariance matrices are equal. Assume that simple random samples of sizes *n*1*,...,nk* are available from these *k* populations.

**6.10.** Repeat Exercise 6.9 for the complex case.

**6.11.** For the second part of Exercise 6.7, which is also known as Wilks' *Lmvc* criterion, show that if *u* = *λ* 2 *<sup>n</sup>* where *λ* is the likelihood ratio criterion and *n* is the sample size, then

$$\mu = \frac{|S|}{[\mathbf{s} + (p - 1)\mathbf{s}\_1][\mathbf{s} - \mathbf{s}\_1 + \frac{n}{p - 1}\sum\_{j=1}^{p}(\bar{\mathbf{x}}\_j - \bar{\mathbf{x}})^2]} \tag{i}$$

where *<sup>S</sup>* <sup>=</sup> *(sij )* is the sample sum of products matrix, *<sup>s</sup>* <sup>=</sup> <sup>1</sup> *p <sup>p</sup> <sup>i</sup>*=<sup>1</sup> *sii, s*<sup>1</sup> <sup>=</sup> 1 *p(p*−1*) <sup>p</sup> i* <sup>=</sup>*j*=<sup>1</sup> *sij* , *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *p <sup>p</sup> <sup>i</sup>*=<sup>1</sup> *<sup>x</sup>*¯*i, <sup>x</sup>*¯*<sup>i</sup>* <sup>=</sup> <sup>1</sup> *n <sup>n</sup> <sup>k</sup>*=<sup>1</sup> *xik*. For the statistic *<sup>u</sup>* in *(i)*, show that the *h*-th null moment or the *h*-th moment when the null hypothesis is true, is given by the following:

$$E[u^h|H\_o] = \prod\_{j=0}^{p-2} \frac{\Gamma(\frac{n-1}{2} + h - \frac{j}{2})}{\Gamma(\frac{n-1}{2} - \frac{j}{2})} \frac{\Gamma(\frac{n+1}{2} + \frac{j}{p-1})}{\Gamma(\frac{n+1}{2} + h + \frac{j}{p-1})}.\tag{ii}$$

Write down the conditions for the existence of the moment in *(ii)*. [For the null and nonnull distributions of Wilks' *Lmvc* criterion, see Mathai (1978).]

**6.12.** Let the *(p* + *q)* × 1 vector *X* have a *(p* + *q)*-variate nonsingular real Gaussian distribution, *X* ∼ *Np*+*q(μ, Σ), Σ > O*. Let

$$
\Sigma = \begin{bmatrix}
\Sigma\_1 & \Sigma\_2 \\
\Sigma\_2' & \Sigma\_3
\end{bmatrix}
$$

where *Σ*<sup>1</sup> is *p* × *p* with all its diagonal elements equal to *σaa* and all other elements equal to *σaa*- , *Σ*<sup>2</sup> has all elements equal to *σab*, *Σ*<sup>3</sup> has all diagonal elements equal to *σbb* and all other elements equal to *σbb* where *σaa, σaa*-*, σbb, σbb* are unknown. Then, *Σ* is known as bipolar. Let *λ* be the likelihood ratio criterion for testing the hypothesis that *Σ* is bipolar. Then show that the *h*-th null moment is the following:

$$\begin{aligned} E[\lambda^h | H\_0] &= [(p-1)^{p-1}(q-1)^{q-1}] \frac{\Gamma[\frac{(q-1)(n-1)}{2}] \Gamma[\frac{(p-1)(n-1)}{2}]}{\Gamma[(p-1)(h+\frac{n-1}{2})] \Gamma[(q-1)(h+\frac{n-1}{2})]} \\ &\times \prod\_{j=0}^{p+q-3} \frac{\Gamma[h+\frac{n-3}{2}-\frac{j}{2}]}{\Gamma[\frac{n-3}{2}-\frac{j}{2}]} \end{aligned}$$

where *n* is the sample size. Write down the conditions for the existence of this *h*-th null moment.

**6.13.** Let *X* be *m* × *n* real matrix having the matrix-variate Gaussian density

$$f(X) = \frac{1}{|2\pi\,\Sigma|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}[\Sigma^{-1}(X-M)(X-M)']}, \,\,\,\Sigma>O.\,.$$

Letting *S* = *XX*- , *S* is a non-central Wishart matrix. Derive the density of *S* and show that this density, denoted by *fs(S)*, is the following:

$$\begin{split} f\_{\boldsymbol{S}}(\boldsymbol{S}) &= \frac{1}{\Gamma\_{m}(\frac{n}{2}) |\boldsymbol{2}\boldsymbol{\Sigma}|^{\frac{n}{2}}} |\boldsymbol{S}|^{\frac{n}{2} - \frac{m+1}{2}} \mathbf{e}^{-\operatorname{tr}(\boldsymbol{\Omega}) - \frac{1}{2} \operatorname{tr}(\boldsymbol{\Sigma}^{-1}\boldsymbol{\mathcal{S}})} \\ &\times \,\_0F\_1(\cdot, \frac{n}{2}; \frac{1}{2}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mathcal{S}}) \end{split}$$

where <sup>=</sup> <sup>1</sup> 2*MM*- *Σ*−<sup>1</sup> is the non-centrality parameter and <sup>0</sup>*F*<sup>1</sup> is a Bessel function of matrix argument.

**6.14.** Show that the *h*-th moment of the determinant of *S*, the non-central Wishart matrix specified in Exercise 6.13, is given by

$$E[|S|^h] = \frac{\Gamma\_m(h + \frac{n}{2})}{\Gamma(\frac{n}{2}) |2\Sigma|^h} \mathbf{e}^{-\text{tr}(\Omega)} \,\_1F\_1(h + \frac{n}{2}; \frac{n}{2}; \Omega)$$

where <sup>1</sup>*F*<sup>1</sup> is a hypergeometric function of matrix argument and is the non-centrality parameter defined in Exercise 6.13.

**6.15.** Letting *v* = *λ* 2 *<sup>n</sup>* in Eq. (6.3.10), show that, under the null hypothesis *Ho*, *v* is distributed as a real scalar type-1 beta with the parameters *(n*−<sup>1</sup> <sup>2</sup> *,* <sup>1</sup> <sup>2</sup> *)* and that *nA*- *X*¯ *X*¯ - *A A*- *SA* is real scalar type-2 beta distributed with the parameters *(* <sup>1</sup> <sup>2</sup> *, <sup>n</sup>*−<sup>1</sup> <sup>2</sup> *)*.

**6.16.** Show that for an arbitrary *h*, the *h*-th null moment of the test statistic *λ* specified in Eq. (6.3.10) is

$$E[\lambda^h|H\_o] = \frac{\Gamma(\frac{n-1}{2} + \frac{nh}{2})}{\Gamma(\frac{n-1}{2})} \frac{\Gamma(\frac{n}{2})}{\Gamma(\frac{n}{2} + \frac{nh}{2})}, \ \Re(h) > -\frac{n-1}{n}.$$

#### **References**

T.W. Anderson (2003): *An Introduction to Multivariate Statistical Analysis*, Third Edition, Wiley, New York.

A.W. Davis (1971): Percentile approximations for a class of likelihood ratio criteria, *Biometrika*, **58**, 349–356.

A.W. Davis and J.B. Field (1971): Tables of some multivariate test criteria, *Technical Report* No. **32**, Division of Mathematical Statistics, CSIRO, Canberra, Australia.

B.P. Korin (1968): On the distribution of a statistic used for testing a covariance matrix, *Biomerika*, **55**, 171–178.

A.M. Mathai (1970a): An expansion of Meijer's G-function in the logarithmic case with applications, *Mathematische Nachrichten*, **48**, 129–139.

A.M. Mathai (1970b): The exact distribution of Bartlett's criterion, *Publ. L'ISUP, Paris*, **19**, 1–15.

A.M. Mathai (1971): On the distribution of the likelihood ratio criterion for testing linear hypotheses on regression coefficients, *Annals of the Institute of Statistical Mathematics*, **23**, 181–197.

A.M. Mathai (1972a): The exact distribution of three tests associated with Wilks' concept of generalized variance, *Sankhya Series A*, **34**, 161–170.

A.M. Mathai (1972b): The exact non-central distribution of the generalized variance, *Annals of the Institute of Statistical Mathematics*, **24**, 53–65.

A.M. Mathai (1973): A review of the various methods of obtaining the exact distributions of multivariate test criteria, *Sankhya Series A*, **35**, 39–60.

A.M. Mathai (1977): A short note on the non-null distribution for the problem of sphericity, *Sankhya Series B*, **39**, 102.

A.M. Mathai (1978): On the non-null distribution of Wilks' Lmvc, *Sankhya Series B*, **40**, 43–48.

A.M. Mathai (1979a): The exact distribution and the exact percentage points for testing equality of variances in independent normal population, *Journal of Statistical Computations and Simulation*, **9**, 169–182.

A.M. Mathai (1979): On the non-null distributions of test statistics connected with exponential populations, *Communications in Statistics (Theory & Methods)*, **8(1)**, 47–55.

A.M. Mathai (1982): On a conjecture in geometric probability regarding asymptotic normality of a random simplex, *Annals of Probability*, **10**, 247–251.

A.M. Mathai (1984): On multi-sample sphericity (in Russian), *Investigations in Statistics (Russian)*, Tom **136**, 153–161.

A.M. Mathai (1985): Exact and asymptotic procedures in the distribution of multivariate test statistic, In *Statistical Theory ad Data Analysis*, K. Matusita Editor, North-Holland, pp. 381–418.

A.M. Mathai (1986): Hypothesis of multisample sphericity, *Journal of Mathematical Sciences*, **33(1)**, 792–796.

A.M. Mathai (1993): *A Handbook of Generalized Special Functions for Statistical and Physical Sciences*, Oxford University Press, Oxford.

A.M. Mathai and H.J. Haubold (2017): *Probability and Statistics: A Course for Physicists and Engineers*, De Gruyter, Germany.

A.M. Mathai and R.S. Katiyar (1979a): Exact percentage points for testing independence, *Biometrika*, **66**, 353–356.

A.M. Mathai and R. S. Katiyar (1979b): The distribution and the exact percentage points for Wilks' Lmvc criterion, *Annals of the Institute of Statistical Mathematics*, **31**, 215–224.

A.M. Mathai and R.S. Katiyar (1980): Exact percentage points for three tests associated with exponential populations, *Sankhya Series B*, **42**, 333–341.

A.M. Mathai and P.N. Rathie (1970): The exact distribution of Votaw's criterion, *Annals of the Institute of Statistical Mathematics*, **22**, 89–116.

A.M. Mathai and P.N. Rathie (1971): Exact distribution of Wilks' criterion, *The Annals of Mathematical Statistics*, **42**, 1010–1019.

A.M. Mathai and R.K. Saxena (1973): *Generalized Hypergeometric Functions with Applications in Statistics and Physical Sciences*, Springer-Verlag, Lecture Notes No. 348, Heidelberg and New York.

A.M. Mathai and R.K. Saxena (1978): *The H-function with Applications in Statistics and Other Disciplines*, Wiley Halsted, New York and Wiley Eastern, New Delhi.

A.M. Mathai, R.K. Saxena and H.J. Haubold (2010): *The H-function: Theory and Applications*, Springer, New York.

B.N. Nagarsenker and K.C.S. Pillai (1973): The distribution of the sphericity test criterion, *Journal of Multivariate Analysis*, **3**, 226–2235.

N. Sugiura and H. Nagao (1968): Unbiasedness of some test criteria for the equality of one or two covariance matrices, *Annals of Mathematical Statistics*, **39**, 1686–1692.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 7 Rectangular Matrix-Variate Distributions**

#### **7.1. Introduction**

Thus far, we have primarily been dealing with distributions involving real positive definite or Hermitian positive definite matrices. We have already considered rectangular matrices in the matrix-variate Gaussian case. In this chapter, we will examine rectangular matrix-variate gamma and beta distributions and also consider to some extent other types of distributions. We will begin with the rectangular matrix-variate real gamma distribution, a version of which was discussed in connection with the pathway model introduced in Mathai (2005). The notations will remain as previously specified. Lower-case letters such as *x, y,z* will denote real scalar variables, whether mathematical or random. Capital letters such as *X, Y* will be used for matrix-variate variables, whether square or rectangular. In the complex domain, a tilde will be placed above the corresponding scalar and matrixvariables; for instance, we will write *x,*˜ *y,*˜ *X,* ˜ *Y*˜. Constant matrices will be denoted by upper-case letter such as *A, B, C*. A tilde will not be utilized for constant matrices except for stressing the point that the constant matrix is in the complex domain. When *X* is a *p* × *p* real positive definite matrix, then *A<X<B* will imply that the constant matrices *A* and *B* are positive definite, that is, *A > O, B > O*, and further that *X > O, X* − *A > O, B* − *X>O*. Real positive definite matrices will be assumed to be symmetric. The corresponding notation for a *p* × *p* Hermitian positive definite matrix is *A < X<B* ˜ . The determinant of a square matrix *A* will be denoted by |*A*| or det*(A)* whereas, in the complex case, the absolute value or modulus of the determinant of *A* will be denoted as |det*(A)*|. When matrices are square, their order will be taken as being *p*×*p* unless specified otherwise. Whenever *A* is a real *p* × *q, q* ≥ *p,* rectangular matrix of full rank *p*, *AA* is positive definite, a prime denoting the transpose. When *A* is in the complex domain, then *AA*∗ is Hermitian positive definite where an *A*∗ indicates the complex conjugate transpose of *A*. Note that all positive definite complex matrices are necessarily Hermitian. As well, d*X* will denote the wedge product of all differentials in the matrix *X*. If *X* = *(xij )* is a

real *<sup>p</sup>* <sup>×</sup> *<sup>q</sup>* matrix, then d*<sup>X</sup>* = ∧*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> <sup>∧</sup>*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>d</sup>*xij* . Whenever *<sup>X</sup>* <sup>=</sup> *(xij )* is a *p* × *p* real symmetric matrix, d*<sup>X</sup>* = ∧*i*≥*j*d*xij* = ∧*i*≤*j*d*xij* , that is, the wedge product of the *p(p*+1*)* 2 distinct differentials. As for the complex matrix *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup> *iX*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, where *<sup>X</sup>*<sup>1</sup> and *X*<sup>2</sup> are real, d*X*˜ = d*X*<sup>1</sup> ∧ d*X*2.

#### **7.2. Rectangular Matrix-Variate Gamma Density, Real Case**

The most commonly utilized real gamma type distributions are the gamma, generalized gamma and Wishart in Statistics and the Maxwell-Boltzmann and Raleigh in Physics. The first author has previously introduced real and complex matrix-variate analogues of the gamma, Maxwell-Boltzmann, Raleigh and Wishart densities where the matrices are *p* ×*p* real positive definite or Hermitian positive definite. For the generalized gamma density in the real scalar case, a matrix-variate analogue can be written down but the associated properties cannot be studied owing to the problem of making a transformation of the type *<sup>Y</sup>* <sup>=</sup> *<sup>X</sup><sup>δ</sup>* for *<sup>δ</sup>* = ±1; additionally, when *X* is real positive definite or Hermitian positive definite, the Jacobians will produce awkward forms that cannot be easily handled, see Mathai (1997) for an illustration wherein *δ* = 2 and the matrix *X* is real and symmetric. Thus, we will provide extensions of the gamma, Wishart, Maxwell-Boltzmann and Raleigh densities to the rectangular matrix-variate cases for *δ* = 1, in both the real and complex domains.

The Maxwell-Boltzmann and Raleigh densities are associated with numerous problems occurring in Physics. A multivariate analogue as well as a rectangular matrix-variate analogue of these densities may become useful in extending the usual theories giving rise to these univariate densities, to multivariate and matrix-variate settings. It will be shown that, as was explained in Mathai (1999), this problem is also connected to the volumes of parallelotopes determined by *p* linearly independent random points in the Euclidean *n*-space, *n* ≥ *p*. Structural decompositions of the resulting random determinants and pathway extensions to gamma, Wishart, Maxwell-Boltzmann and Raleigh densities will also be considered.

In the current nuclear reaction-rate theory, the basic distribution being assumed for the relative velocity of reacting particles is the Maxwell-Boltzmann. One of the forms of this density for the real scalar positive variable case is

$$f\_1(\mathbf{x}) = \frac{4}{\sqrt{\pi}} \beta^{\frac{3}{2}} \mathbf{x}^2 \mathbf{e}^{-\beta \mathbf{x}^2}, \ 0 \le \mathbf{x} < \infty, \ \beta > 0,\tag{7.2.1}$$

and *f*1*(x)* = 0 elsewhere. The Raleigh density is given by

$$f\_2(\mathbf{x}) = \frac{\mathbf{x}}{\alpha^2} \mathbf{e}^{-\frac{\mathbf{x}^2}{2\alpha^2}}, \ 0 \le \mathbf{x} < \infty, \ \alpha > 0,\tag{7.2.2}$$

and *f*<sup>2</sup> = 0 elsewhere, and the three-parameter generalized gamma density has the form

$$f\_{\delta}(\mathbf{x}) = \frac{\delta \, b^{\frac{\alpha}{\delta}}}{\Gamma(\frac{\alpha}{\delta})} x^{\alpha - 1} e^{-bx^{\delta}}, \ x \ge 0, \ b > 0, \ a > 0, \ \delta > 0,\tag{7.2.3}$$

and *f*<sup>3</sup> = 0 elsewhere. Observe that (7.2.1) and (7.2.2) are special cases of (7.2.3). For derivations of a reaction-rate probability integral based on Maxwell-Boltzmann velocity density, the reader is referred to Mathai and Haubold (1988). Various basic results associated with the Maxwell-Boltzmann distribution are provided in Barnes et al. (1982), Critchfield (1972), Fowler (1984), and Pais (1986), among others. The Maxwell-Boltzmann and Raleigh densities have been extended to the real positive definite matrix-variate and the real rectangular matrix-variate cases in Mathai and Princy (2017). These results will be included in this section, along with extensions of the gamma and Wishart densities to the real and complex rectangular matrix-variate cases. Extensions of the gamma and Wishart densities to the real positive definite and complex Hermitian positive definite matrix-variate cases have already been discussed in Chap. 5. The Jacobians that are needed and will be frequently utilized in our discussion are already provided in Chaps. 1 and 4, further details being available from Mathai (1997). The previously defined real matrix-variate gamma *Γp(α)* and complex matrix-variate gamma *Γ*˜ *p(α)* functions will also be utilized in this chapter.

#### **7.2.1. Extension of the gamma density to the real rectangular matrix-variate case**

Consider a *p* × *q, q* ≥ *p*, real matrix *X* of full rank *p*, whose rows are thus linearly independent, and a real-valued scalar function *f (XX*- *)* whose integral over *X* is convergent, that is, # *<sup>X</sup> f (XX*- *)*d*X <* ∞. Letting *S* = *XX*- , *S* will be symmetric as well as real positive definite meaning that for every *p* × 1 non-null vector *Y* , *Y* - *SY >* 0 for all *Y* = *O* (a non-null vector). Then, *<sup>S</sup>* <sup>=</sup> *(sij )* will involve only *p(p*+1*)* <sup>2</sup> differential elements, that is, <sup>d</sup>*<sup>S</sup>* = ∧*<sup>p</sup> <sup>i</sup>*≥*j*=1d*sij* , whereas d*<sup>X</sup>* will contain *pq* differential elements d*xij* 's. As has previously been explained in Chap. 4, the connection between d*X* and d*S* can be established via a sequence of two or three matrix transformations.

Let the *X* = *(xij )* be a *p* × *q, q* ≥ *p*, real matrix of rank *p* where the *xij* 's are distinct real scalar variables. Let *A* be a *p* × *p* real positive definite constant matrix and *B* be a *q* × *q* real positive definite constant matrix, *A* 1 <sup>2</sup> and *B* 1 <sup>2</sup> denoting the respective positive definite square roots of the positive definite matrices *A* and *B*. We will now determine the value of *c* that satisfies the following integral equation:

$$\frac{1}{c} = \int\_{X} |AXBX'| ^ {\nu} \mathbf{e}^{-\text{tr}(AXBX')} \mathbf{d}X. \tag{i}$$

Note that tr*(AXBX*- *)* = tr*(A* 1 2*XBX*- *A* 1 <sup>2</sup> *)*. Letting *Y* = *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> , it follows from Theorem 1.7.4 that d*Y* = |*A*| *q* <sup>2</sup> |*B*| *p* <sup>2</sup> d*X*. Thus,

$$\frac{1}{c} = |A|^{-\frac{q}{2}} |B|^{-\frac{p}{2}} \int\_Y |YY'| ^\gamma \mathbf{e}^{-\text{tr}(YY')} \mathbf{d}Y. \tag{ii}$$

Letting *S* = *Y Y* - , we note that *S* is a *p* × *p* real positive definite matrix, and on applying Theorem 4.2.3, we have d*<sup>Y</sup>* <sup>=</sup> *<sup>π</sup> qp* 2 *Γp( <sup>q</sup>* 2 *)* |*S*| *q* <sup>2</sup> <sup>−</sup>*p*+<sup>1</sup> <sup>2</sup> d*S* where *Γp(*·*)* is the real matrix-variate gamma function. Thus,

$$\frac{1}{c} = |A|^{-\frac{q}{2}} |B|^{-\frac{p}{2}} \frac{\pi^{\frac{qp}{2}}}{\Gamma\_p(\frac{q}{2})} \int\_{S > O} |S|^{\mathcal{V} + \frac{q}{2} - \frac{p+1}{2}} \text{e}^{-\text{tr}(S)} \text{d}S, \ A > O, \ B > O,\tag{iii}$$

the integral being a real matrix-variate gamma integral given by *Γp(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *)* for *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > p*−1 <sup>2</sup> , where *(*·*)* is the real part of *(*·*)*, so that

$$c = \frac{|A|^{\frac{q}{2}} |B|^{\frac{p}{2}} \Gamma\_p(\frac{q}{2})}{\pi^{\frac{qp}{2}} \Gamma\_p(\nu + \frac{q}{2})} \text{ for } \Re(\nu + \frac{q}{2}) > \frac{p-1}{2}, \text{ } A > O, \ B > O. \tag{7.2.4}$$

Let

$$f\_4(X) = c \, |AXBX'| ^ {\vee} \mathbf{e}^{-\text{tr}(AXBX')} \tag{7.2.5}$$

for *A > O, B > O, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, X* = *(xij ),* −∞ *< xij <* ∞*, i* = 1*,... , p, j* = 1*,... ,q,* where *c* is as specified in (7.2.4). Then, *f*4*(X)* is a statistical density that will be referred to as the rectangular real matrix-variate gamma density with shape parameter *γ* and scale parameter matrices *A>O* and *B>O*. Although the parameters are usually real in a statistical density, the above conditions apply to the general complex case.

For *p* = 1*, q* = 1*, γ* = 1*, A* = 1 and *B* = *β >* 0*,* we have |*AXBX*- | = *βx*<sup>2</sup> and

$$\frac{|A|^{\frac{q}{2}}|B|^{\frac{p}{2}}\Gamma(\frac{q}{2})}{\pi^{\frac{qp}{2}}\Gamma\_p(\nu+\frac{q}{2})} = \frac{(\beta)^{\frac{1}{2}}\Gamma(\frac{1}{2})}{\pi^{\frac{1}{2}}\Gamma(\frac{3}{2})} = \frac{2\sqrt{\beta}}{\sqrt{\pi}},$$

so that *c* = <sup>√</sup> 2 *π β* 3 <sup>2</sup> for −∞ *<x<* ∞. Note that when the support of *f (x)* is restricted to the interval 0 ≤ *x <* ∞, the normalizing constant will be multiplied by 2, *f (x)* being a symmetric function. Then, for this particular case, *f*4*(X)* in (7.2.5) agrees with the Maxwell-Boltzmann density for the real scalar positive variable *x* whose density is given in (7.2.1). Accordingly, when *γ* = 1, (7.2.5) with *c* as specified in (7.2.4) will be referred to as the real rectangular matrix-variate Maxwell-Boltzmann density. Observe that for *γ* = 0, (7.2.5) is the real rectangular matrix-variate Gaussian density that was considered in Chap. 4. In the Raleigh case, letting *<sup>p</sup>* <sup>=</sup> <sup>1</sup>*, q* <sup>=</sup> <sup>1</sup>*, A* <sup>=</sup> <sup>1</sup>*, B* <sup>=</sup> <sup>1</sup> <sup>2</sup>*α*<sup>2</sup> and *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> 2 ,

$$|AXBX'|^\mathcal{V} = \left(\frac{\mathbf{x}^2}{2\alpha^2}\right)^{\frac{1}{2}} = \frac{|\mathbf{x}|}{\sqrt{2}|\alpha|} \text{ and } c = \frac{1}{\sqrt{2}\alpha}$$

which gives

$$f\_{\mathfrak{H}}(\mathbf{x}) = \frac{|\mathbf{x}|}{2a^2} \mathbf{e}^{-\frac{\mathbf{x}^2}{2a^2}}, \ -\infty < \mathbf{x} < \infty \text{ or } \ f\_{\mathfrak{H}}(\mathbf{x}) = \frac{\mathbf{x}}{a^2} \mathbf{e}^{-\frac{\mathbf{x}^2}{2a^2}}, \ 0 \le \mathbf{x} < \infty, \ \mathbf{x}$$

for *α >* 0, and *f*<sup>5</sup> = 0 elsewhere where |*x*| denotes the absolute value of *x*, which is the real positive scalar variable case of the Raleigh density given in (7.2.2). Accordingly, (7.2.5) with *<sup>c</sup>* as specified in (7.2.4) wherein *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> will be called the real rectangular matrix-variate Raleigh density.

From (7.2.5), which is the density for *X* = *(xij ), p* × *q, q* ≥ *p* of rank *p*, with −∞ *< xij <* ∞*, i* = 1*,... , p, j* = 1*,...,q*, we obtain the following density for *Y* = *A* 1 <sup>2</sup>*XB* 1 2 :

$$f\_6(Y) \mathrm{d}Y = \frac{\Gamma\_p(\frac{q}{2})}{\pi^{\frac{qp}{2}} \Gamma\_p(\chi + \frac{q}{2})} |YY'| ^\vee \mathrm{e}^{-\mathrm{tr}(YY')} \mathrm{d}Y \tag{7.2.6}$$

for *<sup>γ</sup>* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt; <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and *f*<sup>6</sup> = 0 elsewhere. We will refer to (7.2.6) as the standard form of the real rectangular matrix-variate gamma density. The density of *S* = *Y Y* is then

$$f\tau(S)\,\mathrm{d}S = \frac{1}{\Gamma\_p(\gamma + \frac{q}{2})} |S|^{\prime + \frac{q}{2} - \frac{p+1}{2}} \,\mathrm{e}^{-\mathrm{tr}(S)} \mathrm{d}S \tag{7.2.7}$$

for *S > O, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt; <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and *f*<sup>7</sup> = 0 elsewhere.

**Example 7.2.1.** Specify the distribution of *u* = tr*(A* 1 2*XBX*- *A* 1 <sup>2</sup> *)*, the exponent of the density given in (7.2.5).

**Solution 7.2.1.** Let us determine the moment generating function (mgf) of *u* with parameter *t*. That is,

$$\begin{aligned} M\_{\mathfrak{u}}(t) &= E[\mathfrak{e}^{\prime \mu}] = E[\mathfrak{e}^{\prime \operatorname{tr}(A^{\frac{1}{2}}XBX^{\prime}A^{\frac{1}{2}})}] \\ &= c \int\_X |A^{\frac{1}{2}}XBX^{\prime}A^{\frac{1}{2}}|^{\vee}\mathfrak{e}^{-(1-t)\operatorname{tr}(A^{\frac{1}{2}}XBX^{\prime}A^{\frac{1}{2}})} \operatorname{d}X \end{aligned}$$

where *c* is given in (7.2.4). Let us make the following transformations: *Y* = *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> *, S* = *Y Y* - . Then, all factors, except *Γp(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *)*, are canceled and the mgf becomes

$$M\_{\mathfrak{u}}(t) = \frac{1}{\Gamma\_p(\mathcal{Y} + \frac{q}{2})} \int\_{S > O} |S|^{\mathcal{Y} + \frac{q}{2} - \frac{p+1}{2}} \mathbf{e}^{-(1-t)\text{tr}(S)} \mathrm{d}S$$

for 1 − *t >* 0. On making the transformation *(*1 − *t)S* = *S*<sup>1</sup> and then integrating out *S*1, we obtain the following representation of the moment generating function:

$$M\_{\mu}(t) = (1 - t)^{-p(\nu + \frac{q}{2})}, \ 1 - t > 0,$$

which happens to be the mgf of a real scalar gamma random variables with the parameters *(α* <sup>=</sup> *p(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *), β* = 1*)*, which owing to the uniqueness of the mgf, is the distribution of *u*.

**Example 7.2.2.** Let *U*<sup>1</sup> = *A* 1 2*XBX*- *A* 1 <sup>2</sup> *, U*<sup>2</sup> = *XBX*- *, U*<sup>3</sup> = *B* 1 2*X*- *AXB* 1 <sup>2</sup> *, U*<sup>4</sup> = *X*- *AX*. Determine the corresponding densities when they exist.

**Solution 7.2.2.** Let us examine the exponent in the density (7.2.5). By making use of the commutative property of trace, one can write

$$\text{tr}(A^{\frac{1}{2}}XBX^{\prime}A^{\frac{1}{2}}) = \text{tr}[A(XBX^{\prime})] = \text{tr}(B^{\frac{1}{2}}X^{\prime}AXB^{\frac{1}{2}}) = \text{tr}[B(X^{\prime}AX)].$$

Observe that the exponent depends on the matrix *A* 1 2*XBX*- *A* 1 <sup>2</sup> , which is symmetric and positive definite, and that the functional part of the density also involves its determinant. Thus, the structure is that of real matrix-variate gamma density; however, (7.2.5) gives the density of *X*. Hence, one has to reach *U*<sup>1</sup> from *X* and derive the density of *U*1. Consider the transformation *Y* = *A* 1 2*XBX*- *A* 1 <sup>2</sup> . This will bring *X* to *Y* . Now, let *S* = *Y Y* - = *U*<sup>1</sup> so that the matrix *U*<sup>1</sup> has the real matrix-variate gamma distribution specified in (7.2.7), that is, *<sup>U</sup>*<sup>1</sup> is a real matrix-variate gamma variable with shape parameter *<sup>γ</sup>* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> and scale parameter matrix *I* . Next, consider *U*2. Let us obtain the density of *U*<sup>2</sup> from the density (7.2.5) for *X*. Proceeding as above while ignoring *A* or taking *A* = *I* , (7.2.7) will become the following density, denoted by *fu*<sup>2</sup> *(U*2*)*:

$$f\_{\mu\_2}(U\_2)\mathrm{d}U\_2 = \frac{|A|^{\mathcal{Y} + \frac{q}{2}}}{\Gamma\_p(\mathcal{Y} + \frac{q}{2})} |U\_2|^{\mathcal{Y} + \frac{q}{2} - \frac{p+1}{2}} \mathrm{e}^{-\mathrm{tr}(AU\_2)} \mathrm{d}U\_2,$$

which shows that *<sup>U</sup>*<sup>2</sup> is a real matrix-variate gamma variable with shape parameter *<sup>γ</sup>* <sup>+</sup> *<sup>q</sup>* 2 and scale parameter matrix *A*. With respect to *U*<sup>3</sup> and *U*4, when *q>p*, one has the positive semi-definite factor *X*- *BX* whose determinant is zero; hence, in this singular case, the densities do not exist for *U*<sup>3</sup> and *U*4. However, when *q* = *p*, *U*<sup>3</sup> has a real matrix-variate gamma distribution with shape parameter *<sup>γ</sup>* <sup>+</sup> *<sup>p</sup>* <sup>2</sup> and scale parameter matrix *I* and *U*<sup>4</sup> has a real matrix-variate gamma distribution with shape parameter *<sup>γ</sup>* <sup>+</sup> *<sup>p</sup>* <sup>2</sup> and scale parameter matrix *B*, observing that when *q* = *p* both *U*<sup>3</sup> and *U*<sup>4</sup> are *q* × *q* and positive definite. This completes the solution.

The above findings are stated as a theorem:

**Theorem 7.2.1.** *Let X* = *(xij ) be a real full rank p* × *q matrix, q* ≥ *p, having the density specified in (7.2.5). Let U*1*, U*2*, U*<sup>3</sup> *and U*<sup>4</sup> *be as defined in Example 7.2.2. Then, U*<sup>1</sup> *is real matrix-variate gamma variable with scale parameter matrix I and shape parameter <sup>γ</sup>* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *; <sup>U</sup>*<sup>2</sup> *is real matrix-variate gamma variable with shape parameter <sup>γ</sup>* <sup>+</sup> *<sup>q</sup>* 2 *and scale parameter matrix A; U*<sup>3</sup> *and U*<sup>4</sup> *are singular and do not have densities when q>p; however, and when q* = *p, U*<sup>3</sup> *is real matrix-variate gamma distributed with shape parameter <sup>γ</sup>* <sup>+</sup> *<sup>p</sup>* <sup>2</sup> *and scale parameter matrix I , and U*<sup>4</sup> *is real matrix-variate gamma distributed with shape parameter <sup>γ</sup>* <sup>+</sup> *<sup>p</sup>* <sup>2</sup> *and scale parameter matrix B. Further* |*Ip* − *A* 1 <sup>2</sup>*XBXA* 1 <sup>2</sup> |=|*Iq* − *B* 1 2*X*- *AXB* 1 2 |*.*

*Proof***:** All the results, except the last one, were obtained in Solution 7.2.2. Hence, we shall only consider the last part of the theorem. Observe that when *q>p*, |*A* 1 2*XBX*- *A* 1 2 | *>* 0, the matrix being positive definite, whereas |*B* 1 2*X*- *AXB* 1 <sup>2</sup> | = 0, the matrix being positive semi-definite. The equality is established by noting that in accordance with results previously stated in Sect. 1.3, the determinant of the following partitioned matrix has two representations:

$$\begin{vmatrix} I\_p & A^{\frac{1}{2}} X B^{\frac{1}{2}} \\ B^{\frac{1}{2}} X' A^{\frac{1}{2}} & I\_q \end{vmatrix} = \begin{vmatrix} |I\_p| \, |I\_q - (B^{\frac{1}{2}} X' A^{\frac{1}{2}}) I\_p^{-1} (A^{\frac{1}{2}} X B^{\frac{1}{2}}) | = |I\_q - B^{\frac{1}{2}} X' A X B^{\frac{1}{2}} | \\ |I\_q| \, |I\_p - (A^{\frac{1}{2}} X B^{\frac{1}{2}}) I\_q^{-1} (B^{\frac{1}{2}} X' A^{\frac{1}{2}}) | = |I\_p - A^{\frac{1}{2}} X B X' A^{\frac{1}{2}}| \end{vmatrix}$$

#### **7.2.2. Multivariate gamma and Maxwell-Boltzmann densities, real case**

Multivariate usually means a collection of scalar variables, real or complex. Many real scalar variable cases corresponding to (7.2.1) or a multivariate analogue of thereof can be obtained from (7.2.5) by taking *p* = 1 and *A* = *b >* 0. Note that in this case, *X* is 1 × *q*, that is, *X* = *(x*1*,...,xq )*, and *XBX*is a positive definite quadratic form of the type

$$XBX' = (\mathbf{x}\_1, \dots, \mathbf{x}\_q)B\begin{pmatrix} x\_1 \\ \vdots \\ x\_q \end{pmatrix}.$$

Thus, the density appearing in (7.2.5) becomes

$$f\_{\\$}(X)dX = \frac{b^{\mathcal{V} + \frac{q}{2}}|B|^{\frac{1}{2}}\Gamma(\frac{q}{2})}{\pi^{\frac{q}{2}}\Gamma(\mathcal{V} + \frac{q}{2})} [XBX']^{\mathcal{V}} \mathbf{e}^{-b(XBX')} \mathbf{d}X \tag{7.2.8}$$

for *X* = *(x*1*,...,xq ),* −∞ *< xj <* ∞*, j* = 1*,... ,q, B* = *B*- *> O, b >* 0, and *f*<sup>8</sup> = 0 elsewhere. Then, the density of *Y* = *B* 1 2*X*is given by

$$f\mathfrak{d}(Y)\mathrm{d}Y = b^{\mathcal{V} + \frac{q}{2}} \frac{\Gamma(\frac{q}{2})}{\pi^{\frac{q}{2}}\Gamma(\mathcal{y} + \frac{q}{2})} (\mathbf{y}\_1^2 + \dots + \mathbf{y}\_q^2)^{\mathcal{V}} \mathrm{e}^{-b(\mathbf{y}\_1^2 + \dots + \mathbf{y}\_q^2)} \mathrm{d}Y \tag{7.2.9}$$

where *Y* - <sup>=</sup> *(y*1*,...,yq ),* − ∞ *< yj <sup>&</sup>lt;* <sup>∞</sup>*, j* <sup>=</sup> <sup>1</sup>*,... ,q, b >* <sup>0</sup>*, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *>* 0*,* and *f*<sup>9</sup> = 0 elsewhere. We will take (7.2.8) as the multivariate gamma as well as multivariate Maxwell-Boltzmann density, and (7.2.9) as the standard multivariate gamma as well as standard multivariate Maxwell-Boltzmann density.

How can we show that (7.2.9) is a statistical density? One way consists of writing *f*9*(Y )*d*Y* as *f*9*(S)*d*S*, applying Theorem 4.2.3 of Chap. 4 and writing d*Y* in terms of d*S* for *p* = 1. This will yield the result. Another way is to integrate out variables *y*1*,...,yq* from *f*9*(Y )*d*Y* , which can be achieved via a general polar coordinate transformation such as the following: Consider the variables *y*1*,...,yq ,* − ∞ *< yj <* ∞*, j* = 1*,... ,q,* and the transformation,

$$\begin{aligned} \mathbf{y}\_1 &= r \sin \theta\_1 \\ \mathbf{y}\_j &= r \cos \theta\_1 \cos \theta\_2 \cdots \cos \theta\_{j-1} \sin \theta\_j, \quad j = 2, 3, \dots, q - 1, \\ \mathbf{y}\_q &= r \cos \theta\_1 \cos \theta\_2 \cdots \cos \theta\_{q-1}, \end{aligned}$$

for <sup>−</sup>*<sup>π</sup>* <sup>2</sup> *< θj* <sup>≤</sup> *<sup>π</sup>* <sup>2</sup> *, j* = 1*,...,q* − 2; − *π<θq*−<sup>1</sup> ≤ *π*, which was discussed in Mathai (1997). Its Jacobian is then given by

$$\operatorname{ddy}\_1 \wedge \dots \wedge \operatorname{dy}\_q = r^{q-1} \left\{ \prod\_{j=1}^{q-1} |\cos \theta\_j|^{q-j-1} \right\} \operatorname{d}r \wedge \operatorname{d\theta}\_1 \wedge \dots \wedge \operatorname{d\theta}\_{q-1}.\tag{7.2.10}$$

Under this transformation, *y*<sup>2</sup> <sup>1</sup> +···+ *<sup>y</sup>*<sup>2</sup> *<sup>q</sup>* <sup>=</sup> *<sup>r</sup>*2. Hence, integrating over *<sup>r</sup>*, we have

$$\int\_{r=0}^{\infty} (r^2)^{\gamma} r^{q-1} \mathbf{e}^{-br^2} \mathbf{d}r = \frac{1}{2} b^{-\left(\gamma + \frac{q}{2}\right)} \Gamma(\gamma + \frac{q}{2}), \ \gamma + \frac{q}{2} > 0. \tag{7.2.11}$$

Note that the *θj* 's are present only in the Jacobian elements. There are formulae giving the integral over each differential element. We will integrate the *θj* 's one by one. Integrating over *θ*<sup>1</sup> gives

Rectangular Matrix-Variate Distributions

$$\begin{aligned} \int\_{-\frac{\pi}{2}}^{\frac{\pi}{2}} (\cos \theta\_1)^{q-2} \mathrm{d}\theta\_1 &= 2 \int\_0^{\frac{\pi}{2}} (\cos \theta\_1)^{q-2} \mathrm{d}\theta\_1 = 2 \int\_0^1 z^{q-2} (1 - z^2)^{-\frac{1}{2}} \mathrm{d}z \\ &= \frac{\Gamma(\frac{1}{2}) \Gamma(\frac{q-1}{2})}{\Gamma(\frac{q}{2})}, \; q > 1. \end{aligned}$$

The integrals over *θ*2*, θ*3*,...,θq*−<sup>2</sup> can be similarly evaluated as

$$\frac{\Gamma(\frac{1}{2})\Gamma(\frac{q-2}{2})}{\Gamma(\frac{q-1}{2})}, \frac{\Gamma(\frac{1}{2})\Gamma(\frac{q-3}{2})}{\Gamma(\frac{q-2}{2})}, \dots, \frac{\Gamma(\frac{1}{2})}{\Gamma(\frac{3}{2})}$$

for *q>p*−1, the last integral # *<sup>π</sup>* <sup>−</sup>*<sup>π</sup>* <sup>d</sup>*θq*−<sup>1</sup> giving 2*π*. On taking the product, several gamma functions cancel out, leaving

$$\frac{2\pi [\varGamma(\frac{1}{2})]^{q-2}}{\varGamma(\frac{q}{2})} = \frac{2\pi^{\frac{q}{2}}}{\varGamma(\frac{q}{2})}.\tag{7.2.12}$$

It follows from (7.2.11) and (7.2.12) that (7.2.9) is indeed a density which will be referred to as the standard real multivariate gamma or standard real Maxwell-Boltzmann density.

**Example 7.2.3.** Write down the densities specified in (7.2.8) and (7.2.9) explicitly if

$$B = \begin{bmatrix} 3 & -1 & 0 \\ -1 & 2 & 1 \\ 0 & 1 & 1 \end{bmatrix}, \ b = 2 \text{ and } \wp = 2.$$

**Solution 7.2.3.** Let us evaluate the normalizing constant in (7.2.8). Since in this case, |*B*| = 2,

$$c\_8 = \frac{b^{\gamma + \frac{q}{2}} |B|^{\frac{1}{2}} \Gamma(\frac{q}{2})}{\pi^{\frac{q}{2}} \Gamma(\gamma + \frac{q}{2})} = \frac{2^{2 + \frac{3}{2}} 2^{\frac{1}{2}} \Gamma(\frac{3}{2})}{\pi^{\frac{3}{2}} \Gamma(2 + \frac{3}{2})} = \frac{2^6}{15\pi^{\frac{3}{2}}}.\tag{i}$$

The normalizing constant in (7.2.9) which will be denoted by *c*9, is the same as *c*<sup>8</sup> excluding |*B*| 1 <sup>2</sup> = 2 1 <sup>2</sup> . Thus,

$$c\rho = \frac{2^{\frac{11}{2}}}{15\pi^{\frac{3}{2}}}.\tag{ii}$$

Note that for *X* = [*x*1*, x*2*, x*3]*, XBX*- <sup>=</sup> <sup>3</sup>*x*<sup>2</sup> <sup>1</sup> <sup>+</sup> <sup>2</sup>*x*<sup>2</sup> <sup>2</sup> <sup>+</sup> *<sup>x</sup>*<sup>2</sup> <sup>3</sup> − 2*x*1*x*<sup>2</sup> + 2*x*2*x*<sup>3</sup> and *Y Y* - = *y*2 <sup>1</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>2</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>3</sup> . Hence the densities *f*8*(X)* and *f*9*(Y )* are the following, where *c*<sup>8</sup> and *c*<sup>9</sup> are given in *(i)* and *(ii)*:

$$f\_8(X) = c\_8 \left[ 3x\_1^2 + 2x\_2^2 + x\_3^2 - 2x\_1x\_2 + 2x\_2x\_3 \right]^2 \mathbf{e}^{-2[3x\_1^2 + 2x\_2^2 + x\_3^2 - 2x\_1x\_2 + 2x\_2x\_3]}$$

for −∞ *< xj <* ∞*, j* = 1*,* 2*,* 3, and

$$f\_{\mathfrak{H}}(Y) = c\mathfrak{g}\left[\mathbf{y}\_1^2 + \mathbf{y}\_2^2 + \mathbf{y}\_3^2\right]^2 \mathbf{e}^{-2\mathfrak{l}\mathbf{y}\_1^2 + \mathbf{y}\_2^2 + \mathbf{y}\_3^2\mathbf{j}}, \text{ for } -\infty < \mathbf{y}\_j < \infty, \ j = 1, 2, 3.$$

This completes the computations.

#### **7.2.3. Some properties of the rectangular matrix-variate gamma density**

For the real rectangular matrix-variate gamma and Maxwell-Bolztmann distribution whose density is specified in (7.2.5), what might be the *h*-th moment of the determinant |*AXBX*- | for an arbitrary *h*? This statistical quantity can be evaluated by looking at the normalizing constant *c* given in (7.2.4) since the integrand used to evaluate *E*[|*AXBX*- |]*h*, where *E* denotes the expected value, is nothing but the density of *X* wherein *γ* is replaced by *γ* + *h*. Hence we have

$$E[|AXBX'|]^h] = \frac{\Gamma\_p(\chi + \frac{q}{2} + h)}{\Gamma\_p(\chi + \frac{q}{2})}, \ \Re(h) > -\chi - \frac{q}{2} + \frac{p-1}{2}. \tag{7.2.13}$$

In many calculations involving the Maxwell-Boltzmann density for the real scalar variable case *x*, one has to integrate a function of *x*, say *ν(x)*, over the Maxwell-Boltzmann density, as can be seen for example in equations (4.1) and (4.2) of Mathai and Haubold (1988) in connection with a certain reaction-rate probability integral. Thus, the expression appearing in (7.2.13) corresponds to the integral of a power function over the Maxwell-Boltzmann density.

This arbitrary *h*-th moment expression also reveals an interesting point. By expanding the matrix-variate gamma functions, we have the following:

$$\frac{\Gamma\_p(\wp + \frac{q}{2} + h)}{\Gamma\_p(\wp + \frac{q}{2})} = \prod\_{j=1}^p \frac{\Gamma(\wp + \frac{q}{2} - \frac{j-1}{2} + h)}{\Gamma(\wp + \frac{q}{2} - \frac{j-1}{2})} = \prod\_{j=1}^p E(t\_j)^{h\_j}$$

where *tj* is a real scalar gamma random variable with parameter *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *,* 1*), j* = 1*,... , p,* whose density is

$$g\_{(j)}(t\_j) = \frac{1}{\Gamma(\chi + \frac{q}{2} - \frac{(j-1)}{2})} t\_j^{\chi + \frac{q}{2} - \frac{(j-1)}{2} - 1} \mathbf{e}^{-t\_j}, \ t\_j \ge 0, \ \chi + \frac{q}{2} - \frac{(j-1)}{2} > 0,\tag{7.2.14}$$

and zero elsewhere. Thus structurally,

$$|AXBX'| = t\_1t\_2 \cdots t\_p \tag{7.2.15}$$

where *t*1*,...,tp* are independently distributed real scalar gamma random variables with *tj* having the gamma density given in (7.2.14) for *j* = 1*,...,p*.

#### **7.2.4. Connection to the volume of a random parallelotope**

First, observe that |*AXBX*- |=|*(A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> *)(A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> *)*- |≡|*UU*- | where *U* = *A* 1 <sup>2</sup>*XB* 1 2 . Then, note that *U* is *p* × *q, q* ≥ *p,* and of full rank *p*, and that the *p* linearly independent rows of *U*, taken in the order, will then create a convex hull and a parallelotope in the *q*-dimensional Euclidean space. The *p* rows of *U* represent *p* linearly independent vectors in the Euclidean *q*-space as well as *p* points in the same space. In light of (7.2.14), these random points are gamma distributed, that is, the joint density of the *p* vectors or the *p* random points is the real rectangular matrix-variate density given in (7.2.5), and the volume content of the parallelotope created by these *p* random points is |*AXBX*- | 1 <sup>2</sup> . Accordingly, (7.2.13) represents the *(*2*h)*-th moment of the random volume of the *p*-parallelotope generated by the *p* linearly independent rows of *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> . The geometrical probability problems considered in the literature usually pertain to random volumes generated by independently distributed isotropic random points, isotropic meaning that their associated density is invariant with respect to orthonormal transformations or rotations of the coordinate axes. For instance, the density given in (7.2.9) constitutes an example of isotropic form. The distributions of random geometrical configurations is further discussed in Chap. 4 of Mathai (1999).

#### **7.2.5. Pathway to real matrix-variate gamma and Maxwell-Boltzmann densities**

Consider a model of the following form for a *p* × *q, q* ≥ *p*, matrix *X* of full rank *p*:

$$f\_{10}(X) = c\_{10} |AXBX'| ^ {\prime} |I - a(1 - \alpha) A^{\frac{1}{2}} XBX' A^{\frac{1}{2}}|^{\frac{\eta}{1 - \alpha}}, \ \alpha < 1,\tag{7.2.16}$$

for *A > O, B > O, a >* 0*,η>* 0*, I* − *a(*1 − *α)A* 1 2*XBX*- *A* 1 <sup>2</sup> *> O* (positive definite), and *f*10*(X)* = 0 elsewhere. It will be determined later that the parameter *γ* is subject to the condition *<sup>γ</sup>* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt; <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . When *α >* 1, we let 1 − *α* = −*(α* − 1*), α >* 1*,* so that the model specified in (7.2.16) shifts to the model

$$f\_{11}(X) = c\_{11}|AXBX'| ^ {\prime}|I + a(\alpha - 1)A^{\frac{1}{2}}XBX'A^{\frac{1}{2}}|^{-\frac{\eta}{\alpha - 1}}, \ \alpha > 1\tag{7.2.17}$$

for *η >* 0*,a>* 0*, A > O, B > O,* and *f*11*(X)* = 0 elsewhere. Observe that *A* 1 2*XBX*- *A* 1 2 is symmetric as well as positive definite when *X* is of full rank *p* and *A > O, B > O*. For this model, the condition *<sup>η</sup> <sup>α</sup>*−<sup>1</sup> <sup>−</sup> *<sup>γ</sup>* <sup>−</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt; <sup>p</sup>*−<sup>1</sup> <sup>2</sup> is required in addition to that applying to the parameter *γ* in (7.2.16). Note that when *f*10*(X)* and *f*11*(X)* are taken as statistical densities, *c*<sup>10</sup> and *c*<sup>11</sup> are the associated normalizing constants. Proceeding as in the evaluation of *c* in (7.2.4), we obtain the following representations for *c*<sup>10</sup> and *c*11:

Arak M. Mathai, Serge B. Provost, Hans J. Haubold

$$c\_{10} = |A|^{\frac{q}{2}} |B|^{\frac{p}{2}} [a(1-a)]^{p(\gamma+\frac{q}{2})} \frac{\Gamma\_p(\frac{q}{2})}{\pi^{\frac{qp}{2}}} \frac{\Gamma\_p(\gamma+\frac{q}{2}+\frac{\eta}{1-a}+\frac{p+1}{2})}{\Gamma\_p(\gamma+\frac{q}{2})\Gamma\_p(\frac{\eta}{1-a}+\frac{p+1}{2})} \tag{7.2.18}$$

for *η >* <sup>0</sup>*,α<* <sup>1</sup>*,a>* <sup>0</sup>*, A > O, B > O, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt; <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *,*

$$c\_{11} = \frac{|A|^{\frac{q}{2}} |B|^{\frac{p}{2}} [a(\alpha - 1)]^{p(\gamma + \frac{q}{2})} \Gamma\_p(\frac{q}{2})}{\pi^{\frac{qp}{2}}} \frac{\Gamma\_p(\frac{\eta}{a - 1})}{\Gamma\_p(\gamma + \frac{q}{2})\Gamma\_p(\frac{\eta}{a - 1} - \gamma - \frac{q}{2})} \tag{7.2.19}$$

for *α >* <sup>1</sup>*, η>* <sup>0</sup>*,a>* <sup>0</sup>*, A > O, B > O, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt; <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, <sup>η</sup> <sup>α</sup>*−<sup>1</sup> <sup>−</sup> *<sup>γ</sup>* <sup>−</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt; <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . When *α* → 1<sup>−</sup> in (7.2.18) and *α* → 1<sup>+</sup> in (7.2.19), the models (7.2.16) and (7.2.17) converge to the real rectangular matrix-variate gamma or Maxwell-Boltzmann density specified in (7.2.5). This can be established by applying the following lemmas.

#### **Lemma 7.2.1.**

$$\lim\_{\alpha \to 1\_{-}} |I - a(1 - \alpha)A^{\frac{1}{2}}XBX'A^{\frac{1}{2}}|^{\frac{\eta}{1-\alpha}} = \mathbf{e}^{-a\eta \text{tr}(AXBX')}$$

*and*

$$\lim\_{a \to 1\_+} |I + a(a-1)A^{\frac{1}{2}}XBX'A^{\frac{1}{2}}|^{-\frac{\eta}{a-1}} = \mathbf{e}^{-a\eta \text{tr}(AXBX')}.\tag{7.2.20}$$

*Proof***:** Letting *λ*1*,...,λp* be the eigenvalues of the symmetric matrix *A* 1 2*XBX*- *A* 1 <sup>2</sup> , we have

$$|I - a(1 - \alpha)A^{\frac{1}{2}}XBX'A^{\frac{1}{2}}|^{\frac{\eta}{1 - \alpha}} = \prod\_{j=1}^{p} [1 - a(1 - \alpha)\lambda\_j]^{\frac{\eta}{1 - \alpha}}.$$

However, since

$$\lim\_{\alpha \to 1\_{-}} [1 - a(1 - \alpha)\lambda\_j]^{\frac{\eta}{1 - \alpha}} = \mathbf{e}^{-a\eta\lambda\_j},$$

the product gives the sum of the eigenvalues, that is, tr*(A* 1 2*XBX*- *A* 1 <sup>2</sup> *)* in the exponent, hence the result. The same result can be similarly obtained for the case *α >* 1. We can also show that the normalizing constants *c*<sup>10</sup> and *c*<sup>11</sup> reduce to the normalizing constant in (7.2.4). This can be achieved by making use of an asymptotic expansion of gamma functions, namely,

$$
\Gamma(z+\delta) \approx \sqrt{2\pi} \, z^{z+\delta-\frac{1}{2}} \, \text{e}^{-z} \text{ for } |z| \to \infty, \text{ } \delta \text{ bounded.} \tag{7.2.21}
$$

This first term approximation is also known as Stirling's formula.

#### **Lemma 7.2.2.**

$$\lim\_{\alpha \to 1-} [a(1-\alpha)]^{p(\nu+\frac{q}{2})} \frac{\Gamma\_p(\nu+\frac{q}{2}+\frac{\eta}{1-\alpha}+\frac{p+1}{2})}{\Gamma\_p(\frac{\eta}{1-\alpha}+\frac{p+1}{2})} = (a\eta)^{p(\nu+\frac{q}{2})}$$

*and*

$$\lim\_{\alpha \to 1\_+} [a(\alpha - 1)]^{p(\gamma + \frac{q}{2})} \frac{\Gamma\_p(\frac{\eta}{\alpha - 1})}{\Gamma\_p(\frac{\eta}{\alpha - 1} - \gamma - \frac{q}{2})} = (a\eta)^{p(\gamma + \frac{q}{2})}.\tag{7.2.22}$$

*Proof***:** On expanding *Γp(*·*)* using its definition, for *α >* 1, we have

$$\frac{[a(\alpha -1)]^{p(\boldsymbol{\gamma} + \frac{q}{2})} \Gamma\_p(\frac{\boldsymbol{\eta}}{\alpha - 1})}{\Gamma\_p(\frac{\boldsymbol{\eta}}{\alpha - 1} - \boldsymbol{\gamma} - \frac{q}{2})} = [a(\alpha - 1)]^{p(\boldsymbol{\gamma} + \frac{q}{2})} \prod\_{j=1}^p \frac{\Gamma(\frac{\boldsymbol{\eta}}{\alpha - 1} - \frac{j - 1}{2})}{\Gamma(\frac{\boldsymbol{\eta}}{\alpha - 1} - \boldsymbol{\gamma} - \frac{q}{2} - \frac{j - 1}{2})}.$$

Now, on applying the Stirling's formula as given in (7.2.21) to each of the gamma functions by taking *<sup>z</sup>* <sup>=</sup> *<sup>η</sup> <sup>α</sup>*−<sup>1</sup> → ∞ when *<sup>α</sup>* <sup>→</sup> <sup>1</sup>+, it is seen that the right-hand side of the above equality reduces to *(aη)p(γ* <sup>+</sup>*<sup>q</sup>* 2 *)* . The result can be similarly established for the case *α <* 1.

This shows that *c*<sup>10</sup> and *c*<sup>11</sup> of (7.2.18) and (7.2.19) converge to the normalizing constant in (7.2.4). This means that the models specified in (7.2.16), (7.2.17), and (7.2.5) are all available from either (7.2.16) or (7.2.17) via the pathway parameter *α*. Accordingly, the combined model, either (7.2.16) or (7.2.17), is referred to as the pathway generalized real rectangular matrix-variate gamma density. The Maxwell-Boltzmann case corresponds to *<sup>γ</sup>* <sup>=</sup> 1 and the Raleigh case, to *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> . If either of the Maxwell-Boltzmann or Raleigh densities is the ideal or stable density in a physical system, then these stable densities as well as the unstable neighborhoods, described through the pathway parameter *α <* 1 and *α >* 1, and the transitional stages, are given by (7.2.16) or (7.2.17). The original pathway model was introduced in Mathai (2005).

For addressing other problems occurring in physical situations, one may have to integrate functions of *X* over the densities (7.2.16), (7.2.17) or (7.2.5). Consequently, we will evaluate an arbitrary *h*-th moment of |*AXBX*- | in the models (7.2.16) and (7.2.17). For example, let us determine the *h*-th moment of |*AXBX*- | with respect to the model specified in (7.2.16):

$$E[|AXBX'|^\hbar] = c\_{10} \int\_X |AXBX'|^{\nu+\hbar} \left| I - a(1-a)A^{\frac{1}{2}}XBX'A^{\frac{1}{2}}\right|^{\frac{\eta}{1-\alpha}} \mathrm{d}X.$$

Note that the only change in the integrand, as compared to (7.2.16), is that *γ* is replaced by *γ* + *h*. Hence the result is available from the normalizing constant *c*10, and the answer is the following:

$$E[|AXBX'|^\hbar] = [a(1-\alpha)]^{-ph} \frac{\Gamma\_p(\chi + \frac{q}{2} + h)}{\Gamma\_p(\chi + \frac{q}{2})} \frac{\Gamma\_p(\chi + \frac{q}{2} + \frac{\eta}{1-\alpha} + \frac{p+1}{2})}{\Gamma\_p(\chi + \frac{q}{2} + \frac{\eta}{1-\alpha} + \frac{p+1}{2} + h)} \tag{7.2.23}$$

for *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>+</sup> *h) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *,a>* 0*,α<* 1. Therefore

$$\begin{split} &E[|a(1-\alpha)AXBX'|^{h}] \\ &=\frac{\Gamma\_{p}(\boldsymbol{\chi}+\frac{q}{2}+h)}{\Gamma\_{p}(\boldsymbol{\chi}+\frac{q}{2})}\frac{\Gamma\_{p}(\boldsymbol{\chi}+\frac{q}{2}+\frac{\boldsymbol{\eta}}{1-\boldsymbol{\alpha}}+\frac{p+1}{2})}{\Gamma\_{p}(\boldsymbol{\chi}+\frac{q}{2}+\frac{\boldsymbol{\eta}}{1-\boldsymbol{\alpha}}+\frac{p+1}{2}+h)} \\ &=\prod\_{j=1}^{p}\left\{\frac{\Gamma(\boldsymbol{\chi}+\frac{q}{2}-\frac{j-1}{2}+h)}{\Gamma(\boldsymbol{\chi}+\frac{q}{2}-\frac{j-1}{2})}\frac{\Gamma(\boldsymbol{\chi}+\frac{q}{2}+\frac{\boldsymbol{\eta}}{1-\boldsymbol{\alpha}}+\frac{p+1}{2}-\frac{j-1}{2})}{\Gamma(\boldsymbol{\chi}+\frac{q}{2}+\frac{\boldsymbol{\eta}}{1-\boldsymbol{\alpha}}+\frac{p+1}{2}-\frac{j-1}{2}+h)}\right\} \\ &=\prod\_{j=1}^{p}E\left(\mathbf{y}\_{j}^{h}\right) \end{split} \tag{7.2.24}$$

where *yj* is a real scalar type-1 beta random variable with the parameters *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, <sup>η</sup>* <sup>1</sup>−*<sup>α</sup>* <sup>+</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> *), j* = 1*,... , p,* the *yj* 's being mutually independently distributed. Hence, we have the structural relationship

$$|a(1 - \alpha)AXBX'| = \mathbf{y}\_1 \cdots \mathbf{y}\_p.\tag{7.2.25}$$

Proceeding the same way for the model (7.2.17), we have

$$E[|AXBX'|^h] = [a(\alpha - 1)]^{-ph} \frac{\Gamma\_p(\nu + \frac{q}{2} + h)}{\Gamma\_p(\nu + \frac{q}{2})} \frac{\Gamma\_p(\frac{\eta}{\alpha - 1} - \gamma - \frac{q}{2} - h)}{\Gamma\_p(\frac{\eta}{\alpha - 1} - \gamma - \frac{q}{2})} \tag{7.2.26}$$

for *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>+</sup> *h) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, ( <sup>η</sup> <sup>α</sup>*−<sup>1</sup> <sup>−</sup> *<sup>γ</sup>* <sup>−</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> *h) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> or <sup>−</sup>*(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *)* <sup>+</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> *< (h) < η <sup>α</sup>*−<sup>1</sup> <sup>−</sup> *<sup>γ</sup>* <sup>−</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> . Thus,

$$E[|a(\alpha - 1)AXBX'|^h] = \prod\_{j=1}^p \frac{\Gamma(\nu + \frac{q}{2} - \frac{j-1}{2} + h)}{\Gamma(\nu + \frac{q}{2} - \frac{j-1}{2})} \frac{\Gamma(\frac{\eta}{\alpha - 1} - \nu - \frac{q}{2} - \frac{j-1}{2} - h)}{\Gamma(\frac{\eta}{\alpha - 1} - \nu - \frac{q}{2} - \frac{j-1}{2})}$$

$$= \prod\_{j=1}^p E(z\_j^h) \tag{7.2.27}$$

where *zj* is a real scalar type-2 beta random variable with the parameters *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, <sup>η</sup> <sup>α</sup>*−<sup>1</sup> <sup>−</sup> *<sup>γ</sup>* <sup>−</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *)* for *j* = 1*,... , p,* the *zj* 's being mutually independently distributed. Thus, for *α >* 1, we have the structural representation

$$|a(\alpha - 1)AXBX'| = z\_1 \cdots z\_p \,. \tag{7.2.28}$$

As previously explained, one can consider the *p* linearly independent rows of *A* 1 <sup>2</sup>*XB* 1 2 as *p* vectors in the Euclidean *q*-space. Then, these *p* vectors are jointly distributed as rectangular matrix-variate type-2 beta, and *E*[|*AXBX*- | *<sup>h</sup>*] = *<sup>E</sup>*[|*AXBX*- | 1 2 ] <sup>2</sup>*<sup>h</sup>* is the *(*2*h)* th moment of the volume of the random parallelotope generated by these *p q*-vectors for *q>p*. In this case, the random points will be called type-2 beta distributed random points.

The real Maxwell-Boltzmann case will correspond *γ* = 1 and the Raleigh case, to *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *,* and all the above extensions and properties will apply to both of these distributions.

#### **7.2.6. Multivariate gamma and Maxwell-Boltzmann densities, pathway model**

Consider the density given in (7.2.16) for the case *p* = 1. In this instance, the *p* × *p* constant matrix *A* is 1 × 1 and we shall let *A* = *b >* 0, a positive real scalar quantity. Then for *α <* 1, (7.2.16) reduces to the following where *X* is 1 × *q* of the form *X* = *(x*1*,...,xq ),* − ∞ *< xj <* ∞*, j* = 1*,...,q*:

$$f\_{12}(X) = b^{\frac{q}{2}} |B|^{\frac{1}{2}} [a(1-\alpha)]^{(\nu+\frac{q}{2})} \frac{\Gamma(\frac{q}{2})}{\pi^{\frac{q}{2}}} \frac{\Gamma(\nu+\frac{q}{2}+\frac{\eta}{1-\alpha}+1)}{\Gamma(\nu+\frac{q}{2})\Gamma(\frac{\eta}{1-\alpha}+1)}$$

$$\times [bXBX']^{\nu} [1-a(1-\alpha)bXBX']^{\frac{\eta}{1-\alpha}} \tag{7.2.29}$$

for *b >* 0*, B* = *B*- *> O, a >* <sup>0</sup>*,η>* <sup>0</sup>*, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *>* 0*,* − ∞ *< xj <* ∞*, j* = 1*,... ,q,* 1 − *a(*1 − *α)bXBX*-*>* 0*,α<* 1, and *f*<sup>12</sup> = 0 elsewhere. Note that

$$XBX' = (\mathbf{x}\_1, \dots, \mathbf{x}\_q)B \begin{pmatrix} \mathbf{x}\_1 \\ \vdots \\ \mathbf{x}\_q \end{pmatrix}$$

is a real quadratic form whose associated matrix *B* is positive definite. Letting the 1 × *q* vector *Y* = *XB* 1 <sup>2</sup> , the density of *Y* when *α <* 1 is given by

$$\begin{split} f\_{\|\mathcal{I}}(Y) \, \mathrm{d}Y &= b^{\mathcal{V} + \frac{q}{2}} [a(1-\alpha)]^{\langle \mathcal{V} + \frac{q}{2} \rangle} \frac{\Gamma(\frac{q}{2})}{\pi^{\frac{q}{2}}} \frac{\Gamma(\mathcal{y} + \frac{q}{2} + \frac{\eta}{1-\alpha} + 1)}{\Gamma(\mathcal{y} + \frac{q}{2}) \Gamma(\frac{\eta}{1-\alpha} + 1)} \\ &\times [(\mathbf{y}\_1^2 + \dots + \mathbf{y}\_q^2)]^{\mathcal{V}} [1 - a(1-\alpha)b(\mathbf{y}\_1^2 + \dots + \mathbf{y}\_q^2)]^{\frac{\eta}{1-\alpha}} \, \mathrm{d}Y, \end{split} \tag{7.2.30}$$

for *b >* <sup>0</sup>*, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup>*,η>* <sup>0</sup>*,* − ∞ *< yj <sup>&</sup>lt;* <sup>∞</sup>*, j* <sup>=</sup> <sup>1</sup>*,... ,q,* <sup>1</sup> <sup>−</sup> *a(*<sup>1</sup> <sup>−</sup> *α)b(y*<sup>2</sup> 1 + ···+ *<sup>y</sup>*<sup>2</sup> *<sup>q</sup> ) >* 0, and *f*<sup>13</sup> = 0 elsewhere, which will be taken as the standard form of the real multivariate gamma density in its pathway generalized form, and for *γ* = 1, it will be the real pathway generalized form of the Maxwell-Boltzmann density in the standard multivariate case. For *α >* 1, the corresponding standard form of the real multivariate gamma and Maxwell-Boltzmann densities is given by

$$f\_{\mathbb{H}4}(Y)\mathbb{d}Y = b^{\mathcal{V}+\frac{q}{2}}[a(a-1)]^{(\mathcal{V}+\frac{q}{2})}\frac{\Gamma(\frac{q}{2})}{\pi^{\frac{q}{2}}}\frac{\Gamma(\frac{\eta}{a-1})}{\Gamma(\mathcal{y}+\frac{q}{2})\Gamma(\frac{\eta}{a-1}-\mathcal{y}-\frac{q}{2})}$$

$$\times \left[ (\mathbf{y}\_1^2 + \dots + \mathbf{y}\_q^2) \mathbb{I}^\mathbb{V} [1 + a(a-1)b(\mathbf{y}\_1^2 + \dots + \mathbf{y}\_q^2)]^{-\frac{\eta}{a-1}}\mathbb{d}Y. \right. \quad (7.2.31)$$

for *b >* <sup>0</sup>*, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup>*,a>* <sup>0</sup>*,η>* <sup>0</sup>*, <sup>η</sup> <sup>α</sup>*−<sup>1</sup> <sup>−</sup>*<sup>γ</sup>* <sup>−</sup> *<sup>q</sup>* <sup>2</sup> *>* 0*,* −∞ *< yj <* ∞*, j* = 1*,... ,q,* and *f*<sup>14</sup> = 0 elsewhere. This will be taken as the pathway generalized real multivariate gamma density for *α >* 1, and for *γ* = 1, it will be the standard form of the real pathway extended Maxwell-Boltzmann density for *α >* 1. Note that when *α* → 1<sup>−</sup> in (7.2.30) and *α* → 1<sup>+</sup> in (7.2.31), we have

$$\begin{split} f\_{15}(Y) \mathrm{d}Y &= b^{\prime + \frac{q}{2}} (a\eta)^{(\prime + \frac{q}{2})} \frac{\Gamma(\frac{q}{2})}{\pi^{\frac{q}{2}} \Gamma(\chi + \frac{q}{2})} \\ &\times [(\mathbf{y}\_1^2 + \cdots + \mathbf{y}\_q^2)]^{\mathcal{V}} \mathrm{e}^{-a\eta b(\mathbf{y}\_1^2 + \cdots + \mathbf{y}\_q^2)} \mathrm{d}Y, \end{split} \tag{7.2.32}$$

for *b >* <sup>0</sup>*,a>* <sup>0</sup>*,η>* <sup>0</sup>*, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *>* 0, and *f*<sup>15</sup> = 0 elsewhere, which for *γ* = 1, is the real multivariate Maxwell-Bolzmann density in the standard form. From (7.2.30), (7.2.31), and thereby from (7.2.32), one can obtain the density of *<sup>u</sup>* <sup>=</sup> *<sup>y</sup>*<sup>2</sup> <sup>1</sup> +···+ *<sup>y</sup>*<sup>2</sup> *<sup>q</sup>* , either by using the general polar coordinate transformation or the transformation of variables technique, that is, going from d*Y* to d*S* with *S* = *Y Y* - *, Y* being 1 × *p*. Then, the density of *u* for the case *α <* 1 is

$$f\_{16}(u) = b^{\prime + \frac{q}{2}} [a(1 - \alpha)]^{\prime + \frac{q}{2}} \frac{\Gamma(\gamma + \frac{q}{2} + \frac{\eta}{1 - \alpha} + 1)}{\Gamma(\gamma + \frac{q}{2})\Gamma(\frac{\eta}{1 - \alpha} + 1)} u^{\prime + \frac{q}{2} - 1} [1 - a(1 - \alpha)bu]^{\frac{\eta}{1 - \alpha}}, \ \alpha < 1,\tag{7.2.33}$$

for *b >* <sup>0</sup>*,a>* <sup>0</sup>*,η>* <sup>0</sup>*,α<* <sup>1</sup>*, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *>* 0*,* 1 − *a(*1 − *α)bu >* 0*,* and *f*<sup>16</sup> = 0 elsewhere, the density of *u* for *α >* 1 being

$$f\_{\Gamma}(u) = b^{\mathcal{I} + \frac{q}{2}} [a(a-1)]^{\mathcal{I} + \frac{q}{2}} \frac{\Gamma(\frac{\eta}{a-1})}{\Gamma(\eta + \frac{q}{2})\Gamma(\frac{\eta}{a-1} - \mathcal{I} - \frac{q}{2})} u^{\mathcal{I} + \frac{q}{2} - 1} [1 + a(a-1)bu]^{-\frac{\eta}{a-1}}, \ a > 1, \tag{7.2.34}$$

for *b >* <sup>0</sup>*,a>* <sup>0</sup>*, η>* <sup>0</sup>*, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup>*, <sup>η</sup> <sup>α</sup>*−<sup>1</sup> <sup>−</sup> *<sup>γ</sup>* <sup>−</sup> *<sup>q</sup>* <sup>2</sup> *>* 0*, u* ≥ 0, and *f*<sup>17</sup> = 0 elsewhere. Observe that as *α* → 1, both (7.2.33) and (7.2.34) converge to the form

$$f\_{18}(\mu) = \frac{(a\eta b)^{\nu + \frac{q}{2}}}{\Gamma(\nu + \frac{q}{2})} \mu^{\nu + \frac{q}{2} - 1} \mathbf{e}^{-ab\eta\mu} \tag{7.2.35}$$

for *a >* <sup>0</sup>*,b>* <sup>0</sup>*,η>* <sup>0</sup>*, u* <sup>≥</sup> <sup>0</sup>*,* and *<sup>f</sup>*<sup>18</sup> <sup>=</sup> 0 elsewhere. For *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> , we have the corresponding Raleigh cases.

Letting *γ* = 1 and *q* = 1 in (7.2.32), we have

$$f\_{\rm I^{\mathfrak{I}}}(\mathbf{y}\_{\rm I}) = \frac{b^{\frac{3}{2}}}{\Gamma(\mathbf{y} + \frac{1}{2})} (\mathbf{y}\_{\rm I}^2)^{\mathbf{y}} \mathbf{e}^{-b\mathbf{y}\_{\rm I}^2} = \frac{2b^{\frac{3}{2}}}{\sqrt{\pi}} \mathbf{y}\_{\rm I}^2 \mathbf{e}^{-b\mathbf{y}\_{\rm I}^2}, \ -\infty < \mathbf{y}\_{\rm I} < \infty, \ b > 0$$
 
$$= \frac{4b^{\frac{3}{2}}}{\sqrt{\pi}} \mathbf{y}\_{\rm I}^2 \mathbf{e}^{-b\mathbf{y}\_{\rm I}^2}, \ 0 \le \mathbf{y}\_{\rm I} < \infty, \ b > 0,\tag{7.2.36}$$

and *f*<sup>19</sup> = 0 elsewhere. This is the real Maxwell-Boltzmann case. For the Raleigh case, we let *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> and *p* = 1*, q* = 1 in (7.2.32), which results in the following density:

$$f\_{20}(\mathbf{y}\_1) = b(\mathbf{y}\_1^2)^{\frac{1}{2}} \mathbf{e}^{-b\mathbf{y}\_1^2}, \ -\infty < \mathbf{y}\_1 < \infty, \ b > 0$$

$$= 2b|\mathbf{y}\_1|\mathbf{e}^{-b\mathbf{y}\_1^2}, \ 0 \le \mathbf{y}\_1 < \infty, \ b > 0,\tag{7.2.37}$$

and *f*<sup>20</sup> = 0 elsewhere.

#### **7.2.7. Concluding remarks**

There exist natural phenomena that are suspected to involve an underlying distribution which is not Maxwell-Boltzman but may be some deviation therefrom. In such instances, it is preferable to model the collected data by means of the pathway extended model previously specified for *p* = 1*, q* = 1 (real scalar case), *p* = 1 (real multivariate case) and the general matrix-variate case. The pathway parameter *α* will capture the Maxwell-Boltzmann case, the neighboring models described by the pathway model for *α <* 1 and for *α >* 1 and the transitional stages when moving from one family of functions to another, and thus, to all three different families of functions. Incidentally, for *γ* = 0, one has the rectangular matrix-variate Gaussian density given in (7.2.5) and its pathway extension in (7.2.16) and (7.2.17) or the general extensions in the standard forms in (7.2.30), (7.2.31), and (7.2.32) wherein *γ* = 0. The structures in (7.2.24), (7.2.27), and (7.2.28) suggest that the corresponding densities can also be written in terms of G- and H-functions. For the theory and applications of the G- and H-functions, the reader is referred to Mathai (1993) and Mathai et al. (2010), respectively. The complex analogues of some matrix-variate distributions, including the matrix-variate Gaussian, were introduced in Mathai and Provost (2006). Certain bivariate distributions are discussed in Balakrishnan and Lai (2009) and some general method of generating real multivariate distributions are presented in Marshall and Olkin (1967).

**Example 7.2.4.** Let *X* = *(xij )* be a real *p* × *q, q* ≥ *p,* matrix of rank *p*, where the *xij* 's are distinct real scalar variables. Let the constant matrices *A* = *b >* 0 be 1×1 and *B>O* be *q* × *q*. Consider the following generalized multivariate Maxwell-Boltzmann density

$$f(X) = c \left| AXBX' \right|^{\mathcal{Y}} \mathbf{e}^{-[\text{tr}(AXBX')]^{\delta}}$$

for *δ >* 0*, A* = *b >* 0*, X* = [*x*1*,...,xq* ]. Evaluate *c* if *f (X)* is a density.

**Solution 7.2.4.** Since *X* is 1×*q*, |*AXBX*- | = *b*[*XBX*- ] where *XBX* is a real quadratic form. For *f (X)* to be a density, we must have

$$1 = \int\_X f(X)dX = c \left. b^\vee \int\_X [X \, B X']^\vee \mathbf{e}^{-[(b \, X B X')]^\delta} \mathbf{d}X. \tag{i}$$

Let us make the transformations *Y* = *XB* 1 <sup>2</sup> and *s* = *Y Y* - . Then *(i)* reduces to the following:

$$1 = c \left. b^{\mathcal{V}} \frac{\pi^{\frac{q}{2}}}{\Gamma(\frac{q}{2})} |B|^{-\frac{1}{2}} \int\_0^\infty s^{\mathcal{V} + \frac{q}{2} - 1} e^{-(bs)^\delta} ds. \tag{ii}$$

Letting *<sup>t</sup>* <sup>=</sup> *<sup>b</sup>δsδ, b>* <sup>0</sup>*,s>* <sup>0</sup> <sup>⇒</sup> <sup>d</sup>*<sup>s</sup>* <sup>=</sup> <sup>1</sup> *δ t* 1 *<sup>δ</sup>* <sup>−</sup><sup>1</sup> *<sup>b</sup>* d*t*, *(ii)* becomes

$$\begin{split} 1 &= c \frac{\pi^{\frac{q}{2}}}{\delta \, b^{\frac{q}{2}} \Gamma(\frac{q}{2}) |B|^{\frac{1}{2}}} \int\_{0}^{\infty} t^{\frac{\mathcal{V}}{\delta} + \frac{q}{2\delta} - 1} \mathbf{e}^{-t} \mathbf{d} t \\ &= c \frac{\pi^{\frac{q}{2}} \Gamma(\frac{\mathcal{V}}{\delta} + \frac{q}{2\delta})}{\delta \, b^{\frac{q}{2}} \Gamma(\frac{q}{2}) |B|^{\frac{1}{2}}}. \end{split}$$

Hence,

$$c = \frac{\delta \, b^{\frac{q}{2}} \Gamma(\frac{q}{2}) |B|^{\frac{1}{2}}}{\pi^{\frac{q}{2}} \Gamma(\frac{\mathcal{Y}}{\delta} + \frac{q}{2\delta})}.$$

No additional conditions are required other than *γ >* 0*,δ>* 0*,q>* 0*,B>O*. This completes the solution.

#### **7.2a. Complex Matrix-Variate Gamma and Maxwell-Boltzmann Densities**

The matrix-variate gamma density in the real positive definite matrix case was defined in equation (5.2.4) of Sect. 5.2. The corresponding matrix-variate gamma density in the complex domain was given in Eq. (5.2a.4). Those distributions will be extended to the rectangular matrix-variate cases in this section. A particular case of the rectangular matrixvariate gamma in the complex domain will be called the Maxwell-Boltzmann density in the complex matrix-variate case. Let *X*˜ = *(x*˜*ij )* be a *p* × *q, q* ≥ *p,* rectangular matrix of rank *p* in the complex domain whose elements *x*˜*ij* are distinct scalar complex variables. Let |det*(*·*)*| denote the absolute value of the determinant of *(*·*)*. Let *A* of order *p* × *p* and *B* of order *q* × *q* be real positive definite or Hermitian positive definite constant matrices. The conjugate transpose of *X*˜ will be denoted by *X*˜ <sup>∗</sup>. Consider the function:

$$\tilde{f}(\tilde{X})\mathrm{d}\tilde{X} = \tilde{c} \, |\det(A\tilde{X}B\tilde{X}^\*)|^\mathcal{Y} \mathrm{e}^{-\mathrm{tr}(AXBX^\*)} \mathrm{d}\tilde{X} \tag{7.2a.1}$$

for *A > O, B > O, (γ* + *q) > p* − 1 where *c*˜ is the normalizing constant so that *f (*˜ *X)*˜ is a statistical density. One can evaluate *c*˜ by proceeding as was done in the real case. Let

$$
\tilde{Y} = A^{\frac{1}{2}} \tilde{X} B^{\frac{1}{2}} \Rightarrow \mathrm{d}\tilde{Y} = |\det(A)|^q |\det(B)|^p \mathrm{d}\tilde{X},
$$

the Jacobian of this matrix transformation being provided in Chap. 1 or Mathai (1997). Then, *f (*˜ *X)*˜ becomes

$$
\tilde{f}\_1(\tilde{Y}) \,\mathrm{d}\tilde{Y} = \tilde{c} \, |\det(A)|^{-q} |\det(B)|^{-p} |\det(\tilde{Y}\tilde{Y}^\*)|^{\mathcal{V}} \mathrm{e}^{-\mathrm{tr}(YY^\*)} \mathrm{d}\tilde{Y}.\tag{7.2a.2}
$$

Now, letting

$$\tilde{S} = \tilde{Y}\tilde{Y}^\* \Rightarrow \mathrm{d}\tilde{Y} = \frac{\pi^{qp}}{\tilde{\Gamma}\_p(q)} |\det(\tilde{S})|^{q-p} \mathrm{d}\tilde{S}^\*$$

by applying Result 4.2*a*.3 where *Γ*˜ *p(q)* is the complex matrix-variate gamma function, *f*˜ 1 changes to

$$
\tilde{f}\_2(\tilde{S}) \, \mathrm{d}\tilde{S} = \tilde{c} \, |\det(A)|^{-q} |\det(B)|^{-p} \frac{\pi^{qp}}{\tilde{\Gamma}\_p(q)}
$$

$$
\times |\det(\tilde{S})|^{\mathcal{V}+q-p} \mathbf{e}^{-\mathrm{tr}(\tilde{S})} \mathrm{d}\tilde{S}.\tag{7.2a.3}
$$

Finally, integrating out *S*˜ by making use of a complex matrix-variate gamma integral, we have *Γ*˜ *p(γ* +*q)* for *(γ* +*q) > p*−1. Hence, the normalizing constant *c*˜ is the following:

$$\tilde{c} = \frac{|\det(A)|^q |\det(B)|^p}{\pi^{qp}} \frac{\tilde{\Gamma}\_p(q)}{\tilde{\Gamma}\_p(\chi+q)}, \ \Re(\chi+q) > p-1, \ A > O, \ B > O. \tag{7.2a.4}$$

**Example 7.2a.1.** Evaluate the normalizing constant in the density in (7.2*a*.1) if *γ* = 2*, q* = 3*, p* = 2,

$$A = \begin{bmatrix} 3 & 1+i \\ 1-i & 2 \end{bmatrix}, \ B = \begin{bmatrix} 3 & 0 & i \\ 0 & 2 & 1+i \\ -i & 1-i & 2 \end{bmatrix}.$$

**Solution 7.2a.1.** Note that *A* and *B* are both Hermitian matrices since *A* = *A*<sup>∗</sup> and *B* = *B*∗. The leading minors of *A* are |*(*3*)*| = 3 *>* 0*,* 3 1 + *i* 1 − *i* 2 <sup>=</sup> *(*3*)(*2*)* <sup>−</sup> *(*<sup>1</sup> <sup>+</sup> *i)(*1 − *i)* = 4 *>* 0 and hence, *A>O* (positive definite). The leading minors of *B* are |*(*3*)*| = 3 *>* 0*,* 3 0 0 2 <sup>=</sup> <sup>6</sup> *<sup>&</sup>gt;* <sup>0</sup>*,* <sup>|</sup>*B*| = <sup>3</sup> 2 1 + *i* 1 − *i* 2 + 0 + *i* 0 2 −*i* 1 − *i* = 3*(*4 − 2*)* + *i(*2*i)* = 4 *>* 0*.* Hence *B>O* and |*B*| = 4. The normalizing constant

$$\begin{split} \tilde{c} &= \frac{|\det(A)|^{q}|\det(B)|^{p}}{\pi^{pq}} \frac{\tilde{\Gamma}\_{p}(q)}{\tilde{\Gamma}\_{p}(\mathcal{Y}+q)} \\ &= \frac{(4)^{3}(4)^{2}}{\pi^{6}} \frac{\tilde{\Gamma}\_{2}(3)}{\tilde{\Gamma}\_{2}(5)} = \frac{4^{5}}{\pi^{6}} \frac{\pi \,\Gamma(3)\Gamma(2)}{\pi \,\Gamma(5)\Gamma(4)} \\ &= \frac{2^{7}}{3^{2}\pi^{6}} . \end{split}$$

This completes the computations.

#### **7.2a.1. Extension of the Matrix-Variate Gamma Density in the Complex Domain**

Consider the density of *X*˜ is given in (7.2*a*.1) with *c*˜ given in (7.2*a*.4). The density of *Y*˜ = *A* 1 <sup>2</sup>*XB*˜ <sup>1</sup> <sup>2</sup> is given by

$$f\_1(\tilde{Y}) = \frac{\Gamma\_p(q)}{\pi^{qp}\tilde{\Gamma}\_p(\chi+q)} |\det(\tilde{Y}\tilde{Y}^\*)|^\mathcal{Y} \mathbf{e}^{-\text{tr}(\tilde{Y}\tilde{Y}^\*)}\tag{7.2a.5}$$

for *(γ* + *q) > p* − 1, and *f*˜ <sup>1</sup> = 0 elsewhere, and the density of *S*˜ = *Y*˜*Y*˜ <sup>∗</sup> is Rectangular Matrix-Variate Distributions

$$\tilde{f}\_{\tilde{\mathcal{D}}}(\tilde{\mathcal{S}}) = \frac{1}{\tilde{\Gamma}\_p(\mathcal{Y} + q)} |\det(\tilde{\mathcal{S}})|^{\mathcal{Y} + q - p} \mathbf{e}^{-\text{tr}(\tilde{\mathcal{S}})}, \ \Re(\mathcal{Y} + q) > p - 1,\tag{7.2a.6}$$

and *f*˜ <sup>2</sup> = 0 elsewhere. Then, the density given in (7.2*a*.1), namely, *f (*˜ *X)*˜ for *γ* = 1 will be called the Maxwell-Boltzmann density for the complex rectangular matrix-variate case since for *p* = 1 and *q* = 1 in the real scalar case, the density corresponds to the case *<sup>γ</sup>* <sup>=</sup> 1, and (7.2*a*.1) for *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> will be called complex rectangular matrix-variate Raleigh density.

#### **7.2a.2. The multivariate gamma density in the complex matrix-variate case**

Consider the case *p* = 1 and *A* = *b >* 0 where *b* is a real positive scalar as the *p* × *p* matrix *A* is assumed to be Hermitian positive definite. Then, *X*˜ is 1 × *q* and

$$A\tilde{X}B\tilde{X}^\* = b\tilde{X}B\tilde{X}^\* = b(\tilde{x}\_1, \dots, \tilde{x}\_q)B\begin{pmatrix} \tilde{x}\_1^\* \\ \vdots \\ \tilde{x}\_q^\* \end{pmatrix}$$

is a positive definite Hermitian form, an asterisk denoting only the conjugate when the elements are scalar quantities. Thus, when *p* = 1 and *A* = *b >* 0, the density *f (*˜ *X)*˜ reduces to

$$\tilde{f}\_{\tilde{\mathcal{I}}}(\tilde{X}) = b^{\mathcal{I}+q} |\det(B)| \frac{\tilde{\Gamma}(q)}{\pi^{q} \tilde{\Gamma}(\mathcal{Y}+q)} |\det(\tilde{X}B\tilde{X}^\*)|^{\mathcal{Y}} \mathbf{e}^{-b(\tilde{X}B\tilde{X}^\*)} \tag{7.2a.7}$$

for *X*˜ = *(x*˜1*,..., x*˜*<sup>q</sup> ), B* = *B*<sup>∗</sup> *> O, b >* 0*, (γ* + *q) >* 0, and *f*˜ <sup>3</sup> = 0 elsewhere. Letting *Y*˜ <sup>∗</sup> = *B* 1 <sup>2</sup>*X*˜ <sup>∗</sup>, the density of *Y*˜ is the following:

$$\tilde{f}\_4(\tilde{Y}) = \frac{b^{\gamma+q}\tilde{\Gamma}(q)}{\pi^q\tilde{\Gamma}(\gamma+q)}(|\tilde{\mathbf{y}}\_1|^2 + \dots + |\tilde{\mathbf{y}}\_q|^2)^{\mathcal{Y}}\mathbf{e}^{-b(|\tilde{\mathbf{y}}\_1|^2 + \dots + |\tilde{\mathbf{y}}\_q|^2)}\tag{7.2a.8}$$

for *b >* 0*, (γ* + *q) >* 0*,* and *f*˜ <sup>4</sup> = 0 elsewhere, where | ˜*yj* | is the absolute value or modulus of the complex quantity *y*˜*<sup>j</sup>* . We will take (7.2*a*.8) as the complex multivariate gamma density; when *γ* = 1, it will be referred to as the complex multivariate Maxwell-Boltzmann density, and when *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *,* it will be called complex multivariate Raleigh density. These densities are believed to be new.

Let us verify by integration that (7.2*a*.8) is indeed a density. First, consider the transformation *s* = *Y*˜*Y*˜ <sup>∗</sup>. In view of Theorem 4.2a.3, the integral over the Stiefel manifold gives d*Y*˜ <sup>=</sup> *<sup>π</sup><sup>q</sup> Γ (q)* ˜ *<sup>s</sup>*˜*q*−1d*s*˜, so that *<sup>π</sup><sup>q</sup> Γ (q)* ˜ is canceled. Then, the integral over *<sup>s</sup>*˜ yields *<sup>b</sup>*−*(γ* <sup>+</sup>*q)Γ (γ* ˜ <sup>+</sup> *q), (γ* <sup>+</sup> *q) >* <sup>0</sup>*,* and hence it is verified that (7.2*a*.8) is a statistical density.

#### **7.2a.3. Arbitrary moments, complex case**

Let us determine the *h*-th moment of *u* = |det*(AXB*˜ *X*˜ <sup>∗</sup>*)*| for an arbitrary *h*, that is, the *h*-th moment of the absolute value of the determinant of the matrix *AXB*˜ *X*˜ <sup>∗</sup> or its symmetric form *A* 1 <sup>2</sup>*XB*˜ *X*˜ <sup>∗</sup>*A* 1 <sup>2</sup> , which is

$$E[\left|\det(A\tilde{X}B\tilde{X}^\*)\right|^h = \tilde{c} \int\_{\tilde{X}} \left|\det(A\tilde{X}B\tilde{X})\right|^{h+\gamma} \mathbf{e}^{-\text{tr}(A^\frac{1}{2}\tilde{X}B\tilde{X}^\*A^\frac{1}{2})} \text{d}\tilde{X}.\tag{7.2a.9}$$

Observe that the only change, as compared to the total integral, is that *γ* is replaced by *γ* + *h*, so that the *h*-th moment is available from the normalizing constant *c*˜. Accordingly,

$$E[\boldsymbol{\mu}^h] = \frac{\tilde{\Gamma}\_p(\boldsymbol{\gamma} + \boldsymbol{q} + \boldsymbol{h})}{\tilde{\Gamma}\_p(\boldsymbol{\gamma} + \boldsymbol{q})}, \ \Re(\boldsymbol{\gamma} + \boldsymbol{q} + \boldsymbol{h}) > p - 1,\tag{7.2a.10}$$

$$= \prod\_{j=1}^p \frac{\Gamma(\boldsymbol{\gamma} + \boldsymbol{q} + \boldsymbol{h} - (j - 1))}{\Gamma(\boldsymbol{\gamma} + \boldsymbol{q} - (j - 1))}$$

$$= E(\boldsymbol{u}\_1^h)E(\boldsymbol{u}\_2^h)\cdots E(\boldsymbol{u}\_p^h)\tag{7.2a.11}$$

where the *uj* 's are independently distributed real scalar gamma random variables with parameters *(γ* + *q* − *(j* − 1*),* 1*), j* = 1*,...,p*. Thus, structurally *u* = |det*(AXB*˜ *X*˜ <sup>∗</sup>*)*| is a product of independently distributed real scalar gamma random variables with parameters *(γ* + *q* − *(j* − 1*),* 1*), j* = 1*,...,p*. The corresponding result in the real case is that |*AXBX*- | is structurally a product of independently distributed real gamma random variables with parameters *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *,* 1*), j* = 1*,... , p,* which can be seen from (7.2.15).

#### **7.2a.4. A pathway extension in the complex case**

A pathway extension is also possible in the complex case. The results and properties are parallel to those obtained in the real case. Hence, we will only mention the pathway extended density. Consider the following density:

$$\tilde{f}\_{\tilde{\Sigma}}(\tilde{X}) = \tilde{c}\_{1} |\det(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^{\mathcal{V}} |\det(I - a(1 - \alpha) A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^{\frac{\eta}{1 - \alpha}}\tag{7.2a.12}$$

for *a >* 0*,α<* 1*, I* − *a(*1 − *α)A* 1 <sup>2</sup>*XB*˜ *X*˜ <sup>∗</sup>*A* 1 <sup>2</sup> *> O* (positive definite), *A > O, B > O, η >* 0*, (γ* + *q) > p* − 1, and *f*˜ <sup>5</sup> = 0 elsewhere. Observe that (7.2*a*.12) remains in the generalized type-1 beta family of functions for *α <* 1 (type-1 and type-2 beta densities in the complex rectangular matrix-variate cases will be considered in the next sections). If *α >* 1, then on writing 1 − *α* = −*(α* − 1*), α >* 1, the model in (7.2*a*.12) shifts to the model

$$\tilde{f}\_6(\tilde{X}) = \tilde{c}\_2 |\det(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^\mathcal{V} |\det(I + a(\alpha - 1) A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^{-\frac{\eta}{\alpha - 1}} \tag{7.2a.13}$$

for *a >* 0*,α>* 1*, A > O, B > O, A* 1 <sup>2</sup>*XB*˜ *X*˜ <sup>∗</sup>*A* 1 <sup>2</sup> *> O, η >* <sup>0</sup>*, ( <sup>η</sup> <sup>α</sup>*−<sup>1</sup> <sup>−</sup> *<sup>γ</sup>* <sup>−</sup> *q) > p* − 1*, (γ* + *q) > p* − 1*,* and *f*˜ <sup>6</sup> = 0 elsewhere, where *c*˜<sup>2</sup> is the normalizing constant, different from *c*˜1. When *α* → 1, both models (7.2*a*.12) and (7.2*a*.13) converge to the model *f*˜ <sup>7</sup> where

$$\tilde{f}\_{\tilde{\gamma}}(\tilde{X}) = \tilde{c}\_{\tilde{\gamma}} |\det(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^\mathcal{V} \mathbf{e}^{-a \, \eta \, \text{tr}(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})} \tag{7.2a.14}$$

for *a >* 0*, η>* 0*, A > O, B > O, (γ* + *q) > p* − 1*, A* 1 <sup>2</sup>*XB*˜ *X*˜ <sup>∗</sup>*A* 1 <sup>2</sup> *> O*, and *f*˜ <sup>7</sup> = 0 elsewhere. The normalizing constants can be evaluated by following steps parallel to those used in the real case. They respectively are:

$$\tilde{c}\_1 = |\det(A)|^q |\det(B)|^p [a(1-\alpha)]^{p(\gamma+q)} \frac{\tilde{\Gamma}\_p(q)}{\pi^{qp}} \frac{\tilde{\Gamma}\_p(\gamma+q+\frac{\eta}{1-\alpha}+p)}{\tilde{\Gamma}\_p(\gamma+q)\tilde{\Gamma}\_p(\frac{\eta}{1-\alpha}+p)} \quad (7.2a.15)$$

for *η >* 0*,a>* 0*,α<* 1*, A > O, B > O, (γ* + *q) > p* − 1;

$$\tilde{c}\_2 = |\det(A)|^q |\det(B)|^p [a(a-1)]^{p(\gamma+q)} \frac{\tilde{\Gamma}\_p(q)}{\pi^{qp}} \frac{\tilde{\Gamma}\_p(\frac{\eta}{a-1})}{\tilde{\Gamma}\_p(\gamma+q)\tilde{\Gamma}\_p(\frac{\eta}{a-1}-\gamma-q)} \tag{7.2a.16}$$

for *a >* <sup>0</sup>*,α>* <sup>1</sup>*,η>* <sup>0</sup>*, A > O, B > O, (γ* <sup>+</sup>*q) > p*−1*, ( <sup>η</sup> <sup>α</sup>*−<sup>1</sup> <sup>−</sup>*<sup>γ</sup>* <sup>−</sup>*q) > p*−1;

$$\tilde{c}\_3 = (a\eta)^{p(\gamma+q)} |\det(A)|^q |\det(B)|^p \frac{\Gamma\_p(q)}{\pi^{qp} \tilde{\Gamma}\_p(\gamma+q)}\tag{7.2a.17}$$

for *a >* 0*, η>* 0*, A > O, B > O, (γ* + *q) > p* − 1.

#### **7.2a.5. The Maxwell-Boltzmann and Raleigh cases in the complex domain**

The complex counterparts of the Maxwell-Boltzmann and Raleigh cases may not be available in the literature. Their densities can be derived from (7.2*a*.8). Letting *p* = 1 and *q* = 1 in (7.2*a*.8), we have

$$\begin{split} \tilde{f}\_8(\tilde{\mathbf{y}}\_1) &= \frac{b^{\gamma+1}}{\pi \tilde{\varGamma}(\mathbf{y}+1)} \mathbf{l} [|\tilde{\mathbf{y}}\_1|^2]^{\mathcal{V}} \mathbf{e}^{-b|\tilde{\mathbf{y}}\_1|^2}, \ \tilde{\mathbf{y}}\_1 = \mathbf{y}\_{11} + i \mathbf{y}\_{12}, \ i = \sqrt{(-1)}, \\ &= \frac{b^{\gamma+1}}{\pi \tilde{\varGamma}(\mathbf{y}+1)} [\mathbf{y}\_{11}^2 + \mathbf{y}\_{12}^2]^{\mathcal{V}} \mathbf{e}^{-b(\mathbf{y}\_{11}^2 + \mathbf{y}\_{12}^2)} \end{split} \tag{7.2a.18}$$

for *b >* 0*, (γ* + 1*) >* 0*,* − ∞ *< y*1*<sup>j</sup> <* ∞*, j* = 1*,* 2*,* and *f*˜ <sup>8</sup> = 0 elsewhere. We may take *<sup>γ</sup>* <sup>=</sup> 1 as the Maxwell-Boltzmann case and *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> as the Raleigh case. Then, for *γ* = 1, we have

$$\tilde{f}\_{\mathfrak{P}}(\tilde{\mathbf{y}}\_{\mathfrak{I}}) = \frac{b^2}{\pi} |\tilde{\mathbf{y}}\_{\mathfrak{I}}|^2 \mathbf{e}^{-b|\tilde{\mathbf{y}}\_{\mathfrak{I}}|^2}, \ \tilde{\mathbf{y}}\_{\mathfrak{I}} = \mathbf{y}\_{11} + i \mathbf{y}\_{12}$$

for *b >* 0*,* − ∞ *< y*1*<sup>j</sup> <* ∞*, j* = 1*,* 2*,* and *f*˜ <sup>9</sup> = 0 elsewhere. Note that in the real case *y*<sup>12</sup> = 0 so that the functional part of *f*˜ <sup>6</sup> becomes *y*<sup>2</sup> 11e−*by*<sup>2</sup> <sup>11</sup> *,* − ∞ *< y*<sup>11</sup> *<* ∞. However, the normalizing constants in the real and complex cases are evaluated in different domains. Observe that, corresponding to (7.2a.18), the normalizing constant in the real case is *b<sup>γ</sup>* <sup>+</sup><sup>1</sup> <sup>2</sup> */*[*π* 1 <sup>2</sup>*Γ (γ* <sup>+</sup> <sup>1</sup> <sup>2</sup> *)*]. Thus, the normalizing constant has to be evaluated separately. Consider the integral

$$\int\_{-\infty}^{\infty} \mathbf{y}\_{11}^2 \mathbf{e}^{-by\_{11}^2} \mathbf{d} \mathbf{y}\_{11} = 2 \int\_0^{\infty} \mathbf{y}\_{11}^2 \mathbf{e}^{-by\_{11}^2} \mathbf{d} \mathbf{y}\_{11} = \int\_0^{\infty} u^{\frac{3}{2} - 1} \mathbf{e}^{-bu} \mathbf{d}u = \frac{\sqrt{\pi}}{2b^{\frac{3}{2}}}.$$

Hence,

$$f\_{10}(\mathbf{y}\_{11}) = \frac{2b^{\frac{3}{2}}}{\sqrt{\pi}} \mathbf{y}\_{11}^{2} \mathbf{e}^{-b\mathbf{y}\_{11}^{2}}, \ -\infty < \mathbf{y}\_{11} < \infty, \ b > 0,$$

$$= \frac{4b^{\frac{3}{2}}}{\sqrt{\pi}} \mathbf{y}\_{11}^{2} \mathbf{e}^{-b\mathbf{y}\_{11}^{2}}, \ 0 \le \mathbf{y}\_{11} < \infty, \ b > 0,\tag{7.2a.19}$$

and *f*<sup>10</sup> = 0 elsewhere. This is the real Maxwell-Boltzmann case. For the Raleigh case, letting *<sup>γ</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> in (7.2a.18) yields

$$\begin{split} \tilde{f}\_{11}(\tilde{\mathbf{y}}\_{1}) &= \frac{b^{\frac{3}{2}}}{\pi \Gamma(\frac{3}{2})} [|\tilde{\mathbf{y}}\_{1}|^{2}]^{\frac{1}{2}} \mathbf{e}^{-b(|\tilde{\mathbf{y}}\_{1}|^{2})}, \ b > 0, \\ &= \frac{2b^{\frac{3}{2}}}{\pi^{\frac{3}{2}}} [\mathbf{y}\_{11}^{2} + \mathbf{y}\_{12}^{2}]^{\frac{1}{2}} \mathbf{e}^{-b(\mathbf{y}\_{11}^{2} + \mathbf{y}\_{12}^{2})}, \ -\infty < \mathbf{y}\_{1j} < \infty, \ j = 1, 2, \ b > 0, \end{split} \tag{7.2a.20}$$

and *f*˜ <sup>11</sup> = 0 elsewhere. Then, for *y*<sup>12</sup> = 0, the functional part of *f*˜ <sup>11</sup> is <sup>|</sup>*y*11<sup>|</sup> <sup>e</sup>−*by*<sup>2</sup> <sup>11</sup> with −∞ *< y*<sup>11</sup> *<* ∞. The integral over *y*<sup>11</sup> gives

$$\int\_{-\infty}^{\infty} |\mathbf{y}\_{11}| \, \mathbf{e}^{-by\_{11}^2} \mathbf{d} \mathbf{y}\_{11} = 2 \int\_{0}^{\infty} \mathbf{y}\_{11} \, \mathbf{e}^{-by\_{11}^2} \mathbf{d} \mathbf{y}\_{11} = b^{-1}.$$

Thus, in the Raleigh case,

$$f\_{12}(\mathbf{y}\_{11}) = b \left| \mathbf{y}\_{11} \right| \mathbf{e}^{-b\mathbf{y}\_{11}^2}, \ -\infty < \mathbf{y}\_{11} < \infty, \ b > 0,$$

$$= 2 \, b \, \mathbf{y}\_{11} \, \mathbf{e}^{-b\mathbf{y}\_{11}^2}, \ 0 \le \mathbf{y}\_{11} < \infty, \ b > 0,\tag{7.2a.21}$$

and *f*<sup>12</sup> = 0 elsewhere. The normalizing constant in (7.2a.18) can be verified by making use of the polar coordinate transformation: *y*<sup>11</sup> = *r* cos *θ, y*<sup>12</sup> = *r* sin *θ*, so that d*y*<sup>11</sup> ∧ d*y*<sup>12</sup> = *r* d*r* ∧ d*θ*, 0 *< θ* ≤ 2*π,* 0 ≤ *r <* ∞. Then,

\$ <sup>∞</sup> −∞ \$ <sup>∞</sup> −∞ [*y*2 <sup>11</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> 12] *<sup>γ</sup>* e−*(y*<sup>2</sup> 11+*y*<sup>2</sup> 12*)* d*y*<sup>11</sup> ∧ d*y*<sup>12</sup> = 2*π* \$ <sup>∞</sup> 0 *(r*2*) <sup>γ</sup>* e−*br*<sup>2</sup> *r*d*r* <sup>=</sup> *π b*−*(γ* <sup>+</sup>1*) Γ (γ* <sup>+</sup> <sup>1</sup>*)*

for *b >* 0*, (γ* + 1*) >* 0.

#### **Exercises 7.2**

**7.2.1.** Supply a proof to (7.2.9) by using Theorem 4.2.3.

**7.2.2.** Derive the exact density of the determinant in (7.2.15) for *p* = 2.


**7.2.5.** Derive *c*˜<sup>3</sup> in (7.2*a*.14) by integrating out over *X*˜ .

**7.2.6.** Approximate *c*˜<sup>1</sup> and *c*˜<sup>2</sup> of Exercise 7.2.4 by making use of Stirling's approximation, and then show that the result agrees with that in Exercise 7.2.5.

**7.2.7.** Derive (state and prove) for the complex case the lemmas corresponding to Lemmas 7.2.1 and 7.2.2.

#### **7.3. Real Rectangular Matrix-Variate Type-1 and Type-2 Beta Densities**

Let us begin with the real case. Let *A>O* be *p* × *p* and *B>O* be *q* × *q* where *A* and *B* are real constant matrices. Let *X* = *(xij )* be a *p* × *q, q* ≥ *p,* matrix of distinct real scalar variables *xij* 's as its elements, *X* being of full rank *p*. Then, *A* 1 2*XBX*- *A* 1 <sup>2</sup> *> O* is real positive definite where *A* 1 <sup>2</sup> is the positive definite square root of the positive definite matrix *A*. Let |*(*·*)*| represent the determinant of *(*·*)* when *(*·*)* is real or complex, and |det*(*·*)*| be the absolute value of the determinant of *(*·*)* when *(*·*)* is in the complex domain. Consider the following density:

$$g\_1(X) = C\_1 |A^{\frac{1}{2}} X B X' A^{\frac{1}{2}}|^\gamma |I - A^{\frac{1}{2}} X B X' A^{\frac{1}{2}}|^{\beta - \frac{p+1}{2}}\tag{7.3.1}$$

for *A > O, B > O, I* −*A* 1 2*XBX*- *A* 1 <sup>2</sup> *> O, (β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and *g*<sup>1</sup> = 0 elsewhere, where *C*<sup>1</sup> is the normalizing constant. Accordingly, *U* = *A* 1 2*XBX*- *A* 1 <sup>2</sup> *> O* and *I*−*U>O* or *U* and *I*−*U* are both positive definite. We now make the transformations *Y* = *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> and *S* = *Y Y* - . Then, proceeding as in the case of the rectangular matrixvariate gamma density discussed in Sect. 7.2, and evaluating the final part involving *S* with the help of a real positive definite matrix-variate type-1 beta integral, we obtain the following normalizing constant:

$$C\_1 = |A|^{\frac{q}{2}} |B|^{\frac{p}{2}} \frac{\Gamma\_p(\frac{q}{2})}{\pi^{\frac{qp}{2}}} \frac{\Gamma\_p(\chi + \frac{q}{2} + \beta)}{\Gamma\_p(\beta)\Gamma\_p(\chi + \frac{q}{2})} \tag{7.3.2}$$

for *A > O, B > O, (β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . Usually the parameters associated with a statistical density are real, which is the case for *γ* and *β*. Nonetheless, the conditions will be stated for general complex parameters. When the density of *X* is as given in *g*1, the density of *Y* = *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> is given by

$$\log\_2(Y) = \frac{\Gamma\_p(\frac{q}{2})}{\pi^{\frac{qp}{2}}} \frac{\Gamma\_p(\nu + \frac{q}{2} + \beta)}{\Gamma\_p(\beta)\Gamma\_p(\nu + \frac{q}{2})} |YY'|^{\mathcal{Y}} |I - YY'|^{\beta - \frac{p+1}{2}} \tag{7.3.3}$$

for *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, YY* - *> O, I* − *Y Y* - *> O*, and *g*<sup>2</sup> = 0 elsewhere. When *X* has the density specified in (7.3.1), the density of *S* = *Y Y* is given by

$$\log(S) = \frac{\Gamma\_p(\chi + \frac{q}{2} + \beta)}{\Gamma\_p(\beta)\Gamma\_p(\chi + \frac{q}{2})} |S|^{\nu + \frac{q}{2} - \frac{p+1}{2}} |I - S|^{\beta - \frac{p+1}{2}} \tag{7.3.4}$$

for *(β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, S > O, I* − *S>O*, and *g*<sup>3</sup> = 0 elsewhere, which is the usual real matrix-variate type-1 beta density. Observe that the density *g*1*(X)* is also available from the pathway form of the real matrix-variate gamma case introduced in Sect. 7.2.

**Example 7.3.1.** Let *U*<sup>1</sup> = *A* 1 2*XBX*- *A* 1 <sup>2</sup> *, U*<sup>2</sup> = *XBX*- *, U*<sup>3</sup> = *B* 1 2*X*- *AXB* 1 <sup>2</sup> and *U*<sup>4</sup> = *X*- *AX*. If *X* has the rectangular matrix-variate type-1 beta density given in (7.3.1), evaluate the densities of *U*1*, U*2*, U*<sup>3</sup> and *U*<sup>4</sup> whenever possible.

**Solution 7.3.1.** The matrix *U*<sup>1</sup> is already present in the density of *X*, namely (7.3.1). Now, we have to convert the density of *X* into the density of *U*1. Consider the transformations *Y* = *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> *, S* = *Y Y* - = *U*<sup>1</sup> and the density of *S* is given in (7.3.4). Thus, *U*<sup>1</sup> has a real matrix-variate type-1 beta density with the parameters *<sup>γ</sup>* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> and *β*. Now, on applying the same transformations as above with *A* = *I* , the density appearing in (7.3.4), which is the density of *U*2, becomes

$$\log\_3(U\_2)\mathrm{d}U\_2 = \frac{\Gamma\_p(\chi + \frac{q}{2} + \beta)}{\Gamma\_p(\chi + \frac{q}{2})\Gamma(\beta)}|A|^{\mathcal{V} + \frac{q}{2}}|U\_2|^{\mathcal{V} + \frac{q}{2} - \frac{p+1}{2}}|I - A\mathcal{U}\_2|^{\beta - \frac{p+1}{2}}\mathrm{d}U\_2 \tag{i}$$

for *(β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, A > O, U*<sup>2</sup> *> O, I* − *A* 1 <sup>2</sup>*U*2*A* 1 <sup>2</sup> *> O,* and zero elsewhere, so that *U*<sup>2</sup> has a scaled real matrix-variate type-1 beta distribution with parameters *(γ* <sup>+</sup>*<sup>q</sup>* <sup>2</sup> *, β)* and scaling matrix *A>O*. For *q>p*, both *X*- *AX* and *B* 1 2*X*- *AXB* 1 2 are positive semi-definite matrices whose determinants are thus equal to zero. Accordingly, the densities do not exist whenever *q>p*. When *q* = *p*, *U*<sup>3</sup> has a *q*×*q* real matrix-variate type-1 beta distribution with parameters *(γ* <sup>+</sup> *<sup>p</sup>* <sup>2</sup> *, β)* and *U*<sup>4</sup> is a scaled version of a type-1 beta matrix variable whose density is of the form given in *(i)* wherein *B* is the scaling matrix and *p* and *q* are interchanged. This completes the solution.

#### **7.3.1. Arbitrary moments**

The *h*-th moment of the determinant of *U* = *A* 1 2*XBX*- *A* 1 <sup>2</sup> with *h* being arbitrary, is available from the normalizing constant given in (7.3.2) on observing that when the *h*-th moment is taken, the only change is that *γ* turns into *γ* + *h*. Thus,

$$\begin{split} E[|U|^{\hbar}] &= E[|YY'|^{\hbar}] = E[|S|^{\hbar}] \\ &= \frac{\Gamma\_p(\nu + \frac{q}{2} + h)}{\Gamma\_p(\nu + \frac{q}{2})} \frac{\Gamma\_p(\nu + \frac{q}{2} + \beta)}{\Gamma\_p(\nu + \frac{q}{2} + \beta + h)} \end{split} \tag{7.3.5}$$

$$\Gamma = \prod\_{j=1}^{p} \frac{\Gamma(\wp + \frac{q}{2} - \frac{j-1}{2} + h)}{\Gamma(\wp + \frac{q}{2} - \frac{j-1}{2})} \frac{\Gamma(\wp + \frac{q}{2} + \beta - \frac{j-1}{2})}{\Gamma(\wp + \frac{q}{2} + \beta - \frac{j-1}{2} + h)} \tag{7.3.6}$$

$$\mathbf{u} = E[\boldsymbol{\mu}\_1^h]E[\boldsymbol{\mu}\_2^h]\cdots \boldsymbol{\-}E[\boldsymbol{\mu}\_p^h] \tag{7.3.7}$$

where *u*1*,...,up* are mutually independently distributed real scalar type-1 beta random variables with the parameters *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, β)*, *<sup>j</sup>* <sup>=</sup> <sup>1</sup>*,...,p*, provided *(β) > <sup>p</sup>*−<sup>1</sup> 2 and *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> 2 .

# **7.3.2. Special case:** *p* = 1

For the case *p* = 1, let the positive definite 1 × 1 matrix *A* be the scalar *b >* 0 and *X* which is 1 × *q,* be equal to *(x*1*,...,xq )*. Then,

$$AXBX' = b(\mathbf{x}\_1, \dots, \mathbf{x}\_q)B\begin{pmatrix} \mathbf{x}\_1\\ \vdots\\ \mathbf{x}\_q \end{pmatrix}$$

is a real quadratic form, the matrix of the quadratic form being *B>O*. Letting *Y* = *XB* 1 2 , d*Y* = |*B*| 1 <sup>2</sup> d*X*, and the density of *Y* , denoted by *g*4*(Y )*, is then given by

$$\begin{split} g\_4(Y) &= b^{\gamma + \frac{q}{2}} \frac{\Gamma(\frac{q}{2})}{\pi^{\frac{q}{2}}} \frac{\Gamma(\gamma + \frac{q}{2} + \beta)}{\Gamma(\gamma + \frac{q}{2})\Gamma(\beta)} |YY'|^{\gamma} \\ &\times |I - bYY'|^{\beta - 1}, \ YY' > O, \ I - bYY' > O, \ \Re(\gamma + \frac{q}{2}) > 0, \ \Re(\beta) > 0 \\ &= b^{\gamma + \frac{q}{2}} \frac{\Gamma(\frac{q}{2})}{\pi^{\frac{q}{2}}} \frac{\Gamma(\gamma + \frac{q}{2} + \beta)}{\Gamma(\gamma + \frac{q}{2})\Gamma(\beta)} [\mathbf{y}\_1^2 + \dots + \mathbf{y}\_q^2]^{\gamma} \\ &\times [1 - b(\mathbf{y}\_1^2 + \dots + \mathbf{y}\_q^2)]^{\beta - 1}, \ Y = (\mathbf{y}\_1, \dots, \mathbf{y}\_q), \end{split} \tag{7.3.8}$$

for *b >* <sup>0</sup>*, (γ* <sup>+</sup> <sup>1</sup>*) >* <sup>0</sup>*, (β) >* <sup>0</sup>*,* <sup>1</sup> <sup>−</sup> *b(y*<sup>2</sup> <sup>1</sup> <sup>+</sup> *...* <sup>+</sup> *<sup>y</sup>*<sup>2</sup> *<sup>q</sup> ) >* 0*,* and *g*<sup>4</sup> = 0 elsewhere. The form of the density in (7.3.8) presents some interest as it appears in various areas of research. In reliability studies, a popular model for the lifetime of components corresponds to (7.3.8) wherein *γ* = 0 in both the scalar and multivariate cases. When independently distributed isotropic random points are considered in connection with certain geometrical probability problems, a popular model for the distribution of the random points is the type-1 beta form or (7.3.8) for *γ* = 0. Earlier results obtained assuming that *γ* = 0 and the new case where *γ* = 0 in geometrical probability problems are discussed in Chapter 4 of Mathai (1999). We will take (7.3.8) as the standard form of the real rectangular matrixvariate type-1 beta density for the case *p* = 1 in a *p* × *q, q* ≥ *p,* real matrix *X* of rank *p*. For verifying the normalizing constant in (7.3.8), one can apply Theorem 4.2.3. Letting *S* = *Y Y* - , d*<sup>Y</sup>* <sup>=</sup> *<sup>π</sup> q* 2 *Γ ( <sup>q</sup>* 2 *)* |*S*| *q* <sup>2</sup> <sup>−</sup>1d*S*, which once substituted to d*Y* in (7.3.8) yields a total integral equal to one upon integrating out *S* with the help of a real matrix-variate type-1 beta integral (in this case a real scalar type-1 beta integral); accordingly, the constant part in (7.3.8) is indeed the normalizing constant. In this case, the density of *S* = *Y Y* is given by

$$\log\_5(S) = b^{\mathcal{V} + \frac{q}{2}} \frac{\Gamma(\mathcal{Y} + \frac{q}{2} + \beta)}{\Gamma(\mathcal{Y} + \frac{q}{2})\Gamma(\beta)} |S|^{\mathcal{V} + \frac{q}{2} - 1} |I - bS|^{\beta - 1} \tag{7.3.9}$$

for *S > O, b >* <sup>0</sup>*, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) >* 0*, (β) >* 0*,* and *g*<sup>5</sup> = 0 elsewhere. Observe that this *S* is actually a real scalar variable.

As obtained from (7.3.8), the type-1 beta form of the density in the real scalar case, that is, for *p* = 1 and *q* = 1*,* is

$$g\_{\delta}(\mathbf{y}\_{1}) = b^{\mathcal{V} + \frac{1}{2}} \frac{\Gamma(\mathcal{Y} + \frac{1}{2} + \beta)}{\Gamma(\mathcal{Y} + \frac{1}{2})\Gamma(\beta)} [\mathbf{y}\_{1}^{2}]^{\mathcal{V}} [1 - b\mathbf{y}\_{1}^{2}]^{\beta - 1},\tag{7.3.10}$$

for *b >* <sup>0</sup>*,β>* <sup>0</sup>*, γ* <sup>+</sup> <sup>1</sup> <sup>2</sup> *>* 0*,* − <sup>√</sup> 1 *<sup>b</sup> < y*<sup>1</sup> *<sup>&</sup>lt;* <sup>√</sup> 1 *b ,* and *g*<sup>6</sup> = 0 elsewhere. When the support is 0 *< y*<sup>1</sup> *<* <sup>√</sup> 1 *b* , the above density which is symmetric, is multiplied by two.

#### **7.3a. Rectangular Matrix-Variate Type-1 Beta Density, Complex Case**

Consider the following function:

$$\tilde{g}\_1(\tilde{X}) = \tilde{C}\_1 |\det(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^\prime |\det(I - A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^{\beta - p} \tag{7.3a.1}$$

for *A > O, B > O, (β) > p* − 1*, (γ* + *q) > p* − 1*, I* − *A* 1 <sup>2</sup>*XB*˜ *X*˜ <sup>∗</sup>*A* 1 <sup>2</sup> *> O*, and *g*˜<sup>1</sup> = 0 elsewhere. The normalizing constant can be evaluated by proceeding as in the real rectangular matrix-variate case. Let *Y*˜ = *A* 1 <sup>2</sup>*XB*˜ <sup>1</sup> <sup>2</sup> so that *S*˜ = *Y*˜*Y*˜ <sup>∗</sup>, and then integrate out *S*˜ by using a complex matrix-variate type-1 beta integral, which yields the following normalizing constant:

$$\tilde{C}\_{1} = |\det(A)|^{q} |\det(B)|^{p} \frac{\tilde{\Gamma}\_{p}(q)}{\pi^{qp}} \frac{\tilde{\Gamma}\_{p}(\nu+q+\beta)}{\tilde{\Gamma}\_{p}(\nu+q)\tilde{\Gamma}\_{p}(\beta)} \tag{7.3a.2}$$

for *(γ* + *q) > p* − 1*, (β) > p* − 1*, A > O, B > O*.

#### **7.3a.1. Different versions of the type-1 beta density, the complex case**

The densities that follow can be obtained from that specified in (7.3*a*.1) and certain related transformations. The density of *Y*˜ = *A* 1 <sup>2</sup>*XB*˜ <sup>1</sup> <sup>2</sup> is given by

$$\tilde{\mathbf{g}}\_2(\tilde{Y}) = \frac{\tilde{\Gamma}\_p(q)}{\pi^{qp}} \frac{\tilde{\Gamma}\_p(\chi + q + \beta)}{\tilde{\Gamma}\_p(\chi + q)\tilde{\Gamma}\_p(\beta)} |\det(\tilde{Y}\tilde{Y}^\*)|^\prime |\det(I - \tilde{Y}\tilde{Y}^\*)|^{\beta - p} \tag{7.3a.3}$$

for *(β) > p* −1*, (γ* +*q) > p* −1*, I* −*Y*˜*Y*˜ <sup>∗</sup> *> O*, and *g*˜<sup>2</sup> = 0 elsewhere. The density of *S*˜ = *Y*˜*Y*˜ <sup>∗</sup> is the following:

$$\tilde{\mathbf{g}}\_{3}(\tilde{\mathbf{S}}) = \frac{\ddot{\boldsymbol{\Gamma}}\_{p}(\boldsymbol{\gamma} + \boldsymbol{q} + \boldsymbol{\beta})}{\ddot{\boldsymbol{\Gamma}}\_{p}(\boldsymbol{\gamma} + \boldsymbol{q})\ddot{\boldsymbol{\Gamma}}\_{p}(\boldsymbol{\beta})} |\det(\tilde{\mathbf{S}})|^{\boldsymbol{\gamma} + \boldsymbol{q} - \boldsymbol{p}} |\det(\boldsymbol{I} - \tilde{\mathbf{S}})|^{\boldsymbol{\beta} - \boldsymbol{p}} \tag{7.3a.4}$$

for *(β) > p* − 1*, (γ* + *q) > p* − 1*, S > O, I* ˜ − *S>O* ˜ , and *g*˜<sup>3</sup> = 0 elsewhere.

#### **7.3a.2. Multivariate type-1 beta density, the complex case**

When *p* = 1, *X*˜ is the 1×*q* vector *(x*˜1*,..., x*˜*<sup>q</sup> )* and the 1×1 matrix *A* will be denoted by *b >* 0. The resulting density will then have the same structure as that given in (7.3*a*.1) with *p* replaced by 1 and *A* replaced by *b >* 0:

$$\tilde{g}\_4(\tilde{X}) = \tilde{C}\_2 |\det(\tilde{X}B\tilde{X}^\*)|^\prime |\det(I - b\tilde{X}B\tilde{X}^\*)|^{\beta - 1} \tag{7.3a.5}$$

for *B > O, b >* 0*, (β) > p* − 1*, (γ* + *q) > p* − 1*, I* − *bXB*˜ *X*˜ <sup>∗</sup> *> O*, and *g*˜<sup>4</sup> = 0 elsewhere, where the normalizing constant *C*˜<sup>2</sup> is

$$\tilde{C}\_2 = b^{\gamma+q} |\det(B)| \frac{\tilde{\Gamma}(q)}{\pi^q} \frac{\tilde{\Gamma}(\gamma+q+\beta)}{\tilde{\Gamma}(\gamma+q)\tilde{\Gamma}(\beta)} \tag{7.3a.6}$$

for *b >* <sup>0</sup>*, B > O, (β) >* <sup>0</sup>*, (γ* <sup>+</sup> *q) >* 0. Letting *<sup>Y</sup>*˜ <sup>=</sup> *XB*˜ <sup>1</sup> <sup>2</sup> so that d*Y*˜ = |det*(B)*|d*X*˜ , the density of *Y*˜ reduces to

$$\begin{split} \tilde{g}\_{\tilde{\mathcal{Y}}}(\tilde{Y}) &= b^{\mathcal{Y}+q} \frac{\tilde{\Gamma}(q)}{\pi^{q}} \frac{\tilde{\Gamma}(\mathcal{Y}+q+\mathcal{\beta})}{\tilde{\Gamma}(\mathcal{Y}+q)\tilde{\Gamma}(\mathcal{\beta})} |\text{det}(\tilde{Y}\tilde{Y}^{\*})|^{\mathcal{Y}} |\text{det}(I-b\tilde{Y}\tilde{Y}^{\*})|^{\beta-1} \\ &= b^{\mathcal{Y}+q} \frac{\tilde{\Gamma}(q)}{\pi^{q}} \frac{\tilde{\Gamma}(\mathcal{Y}+q+\mathcal{\beta})}{\tilde{\Gamma}(\mathcal{Y}+q)\tilde{\Gamma}(\mathcal{\beta})} [|\tilde{\gamma}\_{1}|^{2} + \dots + |\tilde{\gamma}\_{q}|^{2}]^{\mathcal{Y}} [1 - b(|\tilde{\gamma}\_{1}|^{2} + \dots + |\tilde{\gamma}\_{q}|^{2})]^{\beta-1} \end{split} \tag{7.3a.7}$$

for *(β) >* 0*, (γ* + *q) >* 0*,* 1 − *b(*| ˜*y*1| <sup>2</sup> +···+|˜*yq* <sup>|</sup> <sup>2</sup>*) >* 0, and *<sup>g</sup>*˜<sup>5</sup> <sup>=</sup> 0 elsewhere. The form appearing in (7.3*a*.8) is applicable to several problems occurring in various areas, as was the case for (7.3.8) in the real domain. However, geometrical probability problems do not appear to have yet been formulated in the complex domain. Let *S*˜ = *Y*˜*Y*˜ <sup>∗</sup>, *S*˜ being in this case a real scalar denoted by *s* whose density is

$$\tilde{\mathbf{g}}\_{\theta}(\mathbf{s}) = b^{\prime + q} \frac{\bar{\Gamma}(\mathbf{y} + q + \beta)}{\tilde{\Gamma}(\mathbf{y} + q)\tilde{\Gamma}(\beta)} \mathbf{s}^{\prime + q - 1} (1 - b\mathbf{s})^{\beta - 1}, \ b > 0 \tag{7.3a.9}$$

for *(β) >* 0*, (γ* + *q) >* 0*,s>* 0*,* 1 − *bs >* 0*,* and *g*˜<sup>6</sup> = 0 elsewhere. Thus, *s* is real scalar type-1 beta random variable with parameters *(γ* + *q,β)* and scaling factor *b >* 0. Note that in the real case, the distribution was also a scalar type-1 beta random variable, but having a different first parameter, namely, *<sup>γ</sup>* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> , its second parameter and scaling factor remaining *β* and *b*.

**Example 7.3a.1.** Express the density (7.3*a*.5) explicitly for *b* = 5*, p* = 1*, q* = 2*, β* = 3*, γ* = 4, *X*˜ = [˜*x*1*, x*˜2]=[*x*1+*iy*1*, x*2+*iy*2], where *xj , yj , j* = 1*,* 2, are real variables, *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*),* and

$$B = \begin{bmatrix} 3 & 1+i \\ 1-i & 1 \end{bmatrix}.$$

**Solution 7.3a.1.** Observe that since *B* = *B*∗, *B* is Hermitian. Its leading minors being |*(*3*)*| = 3 *>* 0 and 3 1 + *i* 1 − *i* 1 <sup>=</sup> *(*3*)(*1*)* <sup>−</sup> *(*<sup>1</sup> <sup>+</sup> *i)(*<sup>1</sup> <sup>−</sup> *i)* <sup>=</sup> <sup>3</sup> <sup>−</sup> <sup>2</sup> <sup>=</sup> <sup>1</sup> *<sup>&</sup>gt;* 0, *<sup>B</sup>* is also positive definite. Letting *Q* = *XB*˜ *X*˜ <sup>∗</sup>,

$$\begin{aligned} \mathcal{Q} &= 3(\mathbf{x}\_1 + i\mathbf{y}\_1)(\mathbf{x}\_1 - i\mathbf{y}\_1) + (1)(\mathbf{x}\_2 + i\mathbf{y}\_2)(\mathbf{x}\_2 - i\mathbf{y}\_2) + (1+i)(\mathbf{x}\_1 + i\mathbf{y}\_1)(\mathbf{x}\_2 - i\mathbf{y}\_2) \\ &+ (1-i)(\mathbf{x}\_2 + i\mathbf{y}\_2)(\mathbf{x}\_1 - i\mathbf{y}\_1) \\ &= 3(\mathbf{x}\_1^2 + \mathbf{y}\_1^2) + (\mathbf{x}\_2^2 + \mathbf{y}\_2^2) + (1+i)[\mathbf{x}\_1\mathbf{x}\_2 + \mathbf{y}\_1\mathbf{y}\_2 - i(\mathbf{x}\_1\mathbf{y}\_2 - \mathbf{x}\_2\mathbf{y}\_1)] \\ &+ (1-i)[\mathbf{x}\_1\mathbf{x}\_2 + \mathbf{y}\_1\mathbf{y}\_2 - i(\mathbf{x}\_2\mathbf{y}\_1 - \mathbf{x}\_1\mathbf{y}\_2)] \\ &= 3(\mathbf{x}\_1^2 + \mathbf{y}\_1^2) + (\mathbf{x}\_2^2 + \mathbf{y}\_2^2) + 2(\mathbf{x}\_1\mathbf{x}\_2 + \mathbf{y}\_1\mathbf{y}\_2) + 2(\mathbf{x}\_1\mathbf{y}\_2 - \mathbf{x}\_2\mathbf{y}\_1). \end{aligned}$$

The normalizing constant being

$$\begin{split} |b^{\mathcal{Y}+q}| \text{det}(\mathcal{B}) |\frac{\tilde{\Gamma}(q)}{\pi^{q}} \frac{\tilde{\Gamma}(\mathcal{Y}+q+\beta)}{\tilde{\Gamma}(\mathcal{Y}+q)\tilde{\Gamma}(\beta)} &= \mathfrak{F}^{6}(1) \frac{\Gamma(2)}{\pi^{2}} \frac{\Gamma(9)}{\Gamma(6)\Gamma(3)} \\ &= \frac{\mathfrak{F}^{6}}{\pi^{2}} \frac{(1!)(8!)}{(5!)(2!)} = \frac{\mathfrak{F}^{6}(168)}{\pi^{2}}. \end{split} \tag{ii}$$

The explicit form of the density (7.3*a*.5) is thus the following:

$$
\tilde{g}\_4(\tilde{X}) = \frac{\mathfrak{F}^6(168)}{\pi^2} \mathcal{Q}^3[1 - 5\mathcal{Q}]^2, \ 1 - 3\mathcal{Q} > 0, \ \mathcal{Q} > 0,
$$

and zero elsewhere, where *Q* is given in *(i)*. It is a multivariate generalized type-1 complexvariate beta density whose scaling factor is 5. Observe that even though *X*˜ is complex, *g*˜4*(X)*˜ is real-valued.

#### **7.3a.3. Arbitrary moments in the complex case**

Consider again the density of the complex *p* × *q, q* ≥ *p,* matrix-variate random variable *X*˜ of full rank *p* having the density specified in (7.3*a*.1). Let *U*˜ = *A* 1 <sup>2</sup>*XB*˜ *X*˜ <sup>∗</sup>*A* 1 2 . The *h*-th moment of the absolute value of the determinant of *U*˜ , that is, *E*[|det*(U )*˜ | *<sup>h</sup>*], will now be determined for arbitrary *h*. As before, note that when the expected value is taken, the only change is that the parameter *γ* is replaced by *γ* +*h*, so that the moment is available from the normalizing constant present in (7.3*a*.2). Thus,

$$E[|\det(\tilde{U})|^{h}] = \frac{\tilde{\Gamma}\_{p}(\chi+q+h)}{\tilde{\Gamma}\_{p}(\chi+q)} \frac{\tilde{\Gamma}\_{p}(\chi+q+\beta)}{\tilde{\Gamma}\_{p}(\chi+q+\beta+h)}\tag{7.3a.10}$$

$$\Gamma = \prod\_{j=1}^{p} \frac{\Gamma(\wp + q + h - (j - 1))}{\Gamma(\wp + q - (j - 1))} \frac{\Gamma(\wp + q + \beta - (j - 1))}{\Gamma(\wp + q + \beta - (j - 1) + h)} \quad \text{(7.3a.11)}$$

$$=E[\boldsymbol{\mu}\_1]^h E[\boldsymbol{\mu}\_2]^h \cdots E[\boldsymbol{\mu}\_p]^h \tag{7.3a.12}$$

where *u*1*,...,up* are independently distributed real scalar type-1 beta random variables with the parameters *(γ* +*q*−*(j*−1*), β)* for *j* = 1*,...,p*. The results are the same as those obtained in the real case except that the parameters are slightly different, the parameters being *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, β), j* = 1*,... , p,* in the real domain. Accordingly, the absolute value of the determinant of *U*˜ in the complex case has the following structural representation:

$$|\det(\tilde{U})| = |\det(A^{\frac{1}{2}}\tilde{X}B\tilde{X}^\*A^{\frac{1}{2}})| = |\det(\tilde{Y}\tilde{Y}^\*)| = |\det(\tilde{S})| = u\_1 \cdots u\_p \tag{7.3a.13}$$

where *u*1*,...,up* are mutually independently distributed real scalar type-1 beta random variables with the parameters *(γ* + *q* − *(j* − 1*), β), j* = 1*,...,p*.

We now consider the scalar type-1 beta density in the complex case. Thus, letting *p* = 1 and *q* = 1 in (7.3*a*.8), we have

$$\begin{split} \tilde{g}\gamma(\tilde{\mathbf{y}}\_{1}) &= b^{\mathcal{I}+1} \frac{1}{\pi} \frac{\Gamma(\mathsf{y}+1+\mathsf{\beta})}{\tilde{\Gamma}(\mathsf{y}+1)\tilde{\Gamma}(\mathsf{\beta})} [|\tilde{\mathbf{y}}\_{1}|^{2}]^{\mathsf{I}} [1-b|\tilde{\mathbf{y}}\_{1}|^{2}]^{\mathsf{I}-1}, \ \tilde{\mathbf{y}}\_{1} &= \mathsf{y}\_{11} + i\mathsf{y}\_{12} \\ &= b^{\mathcal{I}+1} \frac{1}{\pi} \frac{\Gamma(\mathsf{y}+1+\mathsf{\beta})}{\Gamma(\mathsf{y}+1)\Gamma(\mathsf{\beta})} [\mathbf{y}\_{11}^{2} + \mathbf{y}\_{12}^{2}]^{\mathsf{I}} [1-b(\mathbf{y}\_{11}^{2} + \mathbf{y}\_{12}^{2})]^{\mathsf{I}-1} \end{split} \tag{7.3a.14}$$

for *b >* <sup>0</sup>*, (β) >* <sup>0</sup>*, (γ ) >* <sup>−</sup>1*,* − ∞ *< y*1*<sup>j</sup> <sup>&</sup>lt;* <sup>∞</sup>*, j* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,* <sup>1</sup> <sup>−</sup> *b(y*<sup>2</sup> <sup>11</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>12</sup>*) >* 0*,* and *g*˜<sup>7</sup> = 0 elsewhere. The normalizing constant in (7.3a.14) can be verified by making the polar coordinate transformation *y*<sup>11</sup> = *r* cos *θ, y*<sup>12</sup> = *r* sin *θ*, as was done earlier.

#### **Exercises 7.3**

**7.3.1.** Derive the normalizing constant *C*<sup>1</sup> in (7.3.2) and verify the normalizing constants in (7.3.3) and (7.3.4).

**7.3.2.** From *E*[|*A* 1 2*XBX*- *A* 1 <sup>2</sup> |]*<sup>h</sup>* or otherwise, derive the *<sup>h</sup>*-th moment of <sup>|</sup>*XBX*- |. What is then the structural representation corresponding to (7.3.7)?

**7.3.3.** From (7.3.7) or otherwise, derive the exact density of |*U*| for the cases (1): *p* = 2; (2): *p* = 3.

**7.3.4.** Write down the conditions on the parameters *γ* and *β* in (7.3.6) so that the exact density of |*U*| can easily be evaluated for some *p* ≥ 4.

**7.3.5.** Evaluate the normalizing constant in (7.3.8) by making use of the general polar coordinate transformation.

**7.3.6.** Evaluate the normalizing constant in (7.3*a*.2).

**7.3.7.** Derive the exact density of |det*(U )*˜ | in (7.3a.13) for (1): *p* = 2; (2): *p* = 3.

#### **7.4. The Real Rectangular Matrix-Variate Type-2 Beta Density**

Let us consider a *p* × *q, q* ≥ *p,* matrix *X* of full rank *p* and the following associated density:

$$g\_{\\$}(X) = C\_3 |AXBX'| ^ {\prime} |I + AXBX'|^{-(\gamma + \frac{q}{2} + \beta)} \tag{7.4.1}$$

for *A > O, B > O, (β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and *g*<sup>8</sup> = 0 elsewhere. The normalizing constant can be seen to be the following:

$$C\_3 = |A|^{\frac{q}{2}} |B|^{\frac{p}{2}} \frac{\Gamma\_p(\frac{q}{2})}{\pi^{\frac{qp}{2}}} \frac{\Gamma\_p(\nu + \frac{q}{2} + \beta)}{\Gamma\_p(\nu + \frac{q}{2})\Gamma\_p(\beta)}\tag{7.4.2}$$

for *A > O, B > O, (β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . Letting *Y* = *A* 1 <sup>2</sup>*XB* 1 <sup>2</sup> , its density denoted by *g*9*(Y )*, is

$$g\_{\mathcal{B}}(Y) = \frac{\Gamma\_p(\frac{q}{2})}{\pi^{\frac{qp}{2}}} \frac{\Gamma\_p(\chi + \frac{q}{2} + \beta)}{\Gamma\_p(\chi + \frac{q}{2})\Gamma\_p(\beta)} |YY'|^{\mathcal{Y}} |I + YY'|^{-(\chi + \frac{q}{2} + \beta)}\tag{7.4.3}$$

for *(β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and *g*<sup>9</sup> = 0 elsewhere. The density of *S* = *Y Y* then reduces to

$$\log\_{10}(S) = \frac{\Gamma\_p(\chi + \frac{q}{2} + \beta)}{\Gamma\_p(\chi + \frac{q}{2})\Gamma\_p(\beta)} |S|^{\mathcal{V} + \frac{q}{2} - \frac{p+1}{2}} |I + S|^{-(\mathcal{V} + \frac{q}{2} + \beta)},\tag{7.4.4}$$

for *(β) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and *g*<sup>10</sup> = 0 elsewhere.

#### **7.4.1. The real type-2 beta density in the multivariate case**

Consider the case *p* = 1 and *A* = *b >* 0 in (7.4.1). The resulting density has the same structure, with *A* replaced by *b* and *X* being the 1 × *q* vector *(x*1*,...,xq )*. Letting *Y* = *XB* 1 <sup>2</sup> , the following density of *Y* = *(y*1*,...,yq )*, denoted by *g*11*(Y )*, is obtained:

$$\begin{split} \mathbf{g}\_{11} &= \mathbf{b}^{\mathcal{V}+\frac{q}{2}} \frac{\Gamma(\frac{q}{2})}{\pi^{\frac{q}{2}}} \frac{\Gamma(\mathcal{y}+\frac{q}{2}+\beta)}{\Gamma(\mathcal{y}+\frac{q}{2})\Gamma(\beta)} [\mathbf{y}\_1^2 + \dots + \mathbf{y}\_q^2]^{\mathcal{V}} \\ &\times [1 + b(\mathbf{y}\_1^2 + \dots + \mathbf{y}\_q^2)]^{-(\mathcal{y}+\frac{q}{2}+\beta)} \end{split} \tag{7.4.5}$$

for *b >* <sup>0</sup>*, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) >* 0*, (β) >* 0, and *g*<sup>11</sup> = 0 elsewhere. The density appearing in (7.4.5) will be referred to as the standard form of the real rectangular matrix-variate type-2 beta density. In this case, *A* is taken as *A* = *b >* 0 and *Y* = *XB* 1 <sup>2</sup> . What might be the standard form of the real type-2 beta density in the real scalar case, that is, when it is assumed that *p* = 1*, q* = 1*, A* = *b >* 0 and *B* = 1 in (7.4.1)? In this case, it is seen from (7.4.5) that

$$\log\_{12}(\mathbf{y}\_1) = b^{\mathbf{y} + \frac{1}{2}} \frac{\Gamma(\mathbf{y} + \frac{1}{2} + \beta)}{\Gamma(\mathbf{y} + \frac{1}{2})\Gamma(\beta)} [\mathbf{y}\_1^2]^{\mathbf{y}} [1 + b \mathbf{y}\_1^2]^{-(\mathbf{y} + \frac{1}{2} + \beta)}, \ -\infty < \mathbf{y}\_1 < \infty,\tag{7.4.6}$$

for *(β) >* <sup>0</sup>*, (γ* <sup>+</sup> <sup>1</sup> <sup>2</sup> *) >* 0*, b>* 0*,* and *g*<sup>12</sup> = 0 elsewhere.

#### **7.4.2. Moments in the real rectangular matrix-variate type-2 beta density**

Letting *U* = *A* 1 2*XBX*- *A* 1 <sup>2</sup> , what would be the *h*-th moment of the determinant of *U*, that is, *E*[|*U*| *<sup>h</sup>*] for arbitrary *<sup>h</sup>*? Upon determining *<sup>E</sup>*[|*U*<sup>|</sup> *<sup>h</sup>*], the parameter *<sup>γ</sup>* is replaced by *γ* + *h* while the other parameters remain unchanged. The *h*-th moment which is thus available from the normalizing constant, is given by

$$E[|U|^h] = \frac{\Gamma\_p(\chi + \frac{q}{2} + h)}{\Gamma\_p(\chi + \frac{q}{2})} \frac{\Gamma\_p(\beta - h)}{\Gamma\_p(\beta)}\tag{7.4.7}$$

$$=\prod\_{j=1}^{p} \frac{\Gamma(\gamma + \frac{q}{2} - \frac{j-1}{2} + h)}{\Gamma(\gamma + \frac{q}{2} - \frac{j-1}{2})} \frac{\Gamma(\beta - \frac{j-1}{2} - h)}{\Gamma(\beta - \frac{j-1}{2})} \tag{7.4.8}$$

$$=E[\boldsymbol{u}\_1^h]E[\boldsymbol{u}\_2^h]\cdots E[\boldsymbol{u}\_p^h] \tag{7.4.9}$$

where *u*1*,...,up* are mutually independently distributed real scalar type-2 beta random variables with the parameters *(γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, β* <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *), j* = 1*,...,p*. That is, (7.4.9) or (7.4.10) gives a structural representation to the determinant of *U* as

$$|U| = |A^{\frac{1}{2}}XBX'A^{\frac{1}{2}}| = |YY'| = |S| = u\_1 \cdots u\_p \tag{7.4.10}$$

where the *uj* 's are mutually independently distributed real scalar type-2 beta random variables as specified above.

**Example 7.4.1.** Evaluate the density of *u* = |*A* 1 2*XBX*- *A* 1 <sup>2</sup> | for *p* = 2 and the general parameters *γ, q, β* where *X* has a real rectangular matrix-variate type-2 beta density with the parameter matrices *A>O* and *B>O* where *A* is *p* × *p*, *B* is *q* × *q* and *X* is a *p* × *q, q* ≥ *p,* rank *p* matrix.

**Solution 7.4.1.** The general *h*-th moment of *u* can be determined from (7.4.8). Letting *p* = 2, we have

$$E[\boldsymbol{u}^{h}] = \frac{\Gamma(\boldsymbol{\gamma} + \frac{q}{2} - \frac{1}{2} + h)\Gamma(\boldsymbol{\gamma} + \frac{q}{2} + h)}{\Gamma(\boldsymbol{\gamma} + \frac{q}{2} - \frac{1}{2})\Gamma(\boldsymbol{\gamma} + \frac{q}{2})} \frac{\Gamma(\beta - \frac{1}{2} - h)\Gamma(\beta - h)}{\Gamma(\beta - \frac{1}{2})\Gamma(\beta)}\tag{i}$$

for <sup>−</sup>*<sup>γ</sup>* <sup>−</sup> *<sup>q</sup>* <sup>2</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> *<sup>&</sup>lt; (h) < β* <sup>−</sup> <sup>1</sup> <sup>2</sup> , *β >* <sup>1</sup> <sup>2</sup> *, γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *<sup>&</sup>gt;* <sup>1</sup> <sup>2</sup> . Since four pairs of gamma functions differ by <sup>1</sup> <sup>2</sup> , we can combine them by applying the duplication formula for gamma functions, namely,

$$
\Gamma(z)\Gamma(z+1/2) = \pi^{\frac{1}{2}}2^{1-2z}\Gamma(2z). \tag{ii}
$$

Take *<sup>z</sup>* <sup>=</sup> *<sup>γ</sup>* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> <sup>+</sup> *<sup>h</sup>* and *<sup>z</sup>* <sup>=</sup> *<sup>γ</sup>* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> in the first set of gamma ratios in *(i)* and *<sup>z</sup>* <sup>=</sup> *<sup>β</sup>* <sup>−</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> *<sup>h</sup>* and *<sup>z</sup>* <sup>=</sup> *<sup>β</sup>* <sup>−</sup> <sup>1</sup> <sup>2</sup> in the second set of gamma ratios in *(i)*. Then, we have the following:

$$\begin{split} \frac{\Gamma(\gamma + \frac{q}{2} - \frac{1}{2} + h)\Gamma(\gamma + \frac{q}{2} + h)}{\Gamma(\gamma + \frac{q}{2} - \frac{1}{2})\Gamma(\gamma + \frac{q}{2})} &= \frac{\pi^{\frac{1}{2}} 2^{1 - 2\gamma - q + 1 - 2h} \Gamma(2\gamma + q - 1 + 2h)}{\pi^{\frac{1}{2}} 2^{1 - 2\gamma - q + 1} \Gamma(2\gamma + q - 1)} \\ &= 2^{-2h} \frac{\Gamma(2\gamma + q - 1 + 2h)}{\Gamma(2\gamma + q - 1)} \\ \frac{\Gamma(\beta - \frac{1}{2} - h)\Gamma(\beta - h)}{\Gamma(\beta - \frac{1}{2})\Gamma(\beta)} &= \frac{\pi^{\frac{1}{2}} 2^{1 - 2\beta + 1 + 2h} \Gamma(2\beta - 1 - 2h)}{\pi^{\frac{1}{2}} 2^{1 - 2\beta + 1} \Gamma(2\beta - 1)} \end{split} \tag{iiii}$$

$$=2^{2h}\frac{\Gamma(2\beta-1-2h)}{\Gamma(2\beta-1)},\tag{i\nu}$$

the product of *(iii)* and *(iv)* yielding the simplified representation of the *h*-th moment of *u* that follows:

$$E[\mu^h] = \frac{\Gamma(2\chi + q - 1 + 2h)}{\Gamma(2\chi + q - 1)} \frac{\Gamma(2\beta - 1 - 2h)}{\Gamma(2\beta - 1)}.$$

Now, since *<sup>E</sup>*[*uh*] = *<sup>E</sup>*[*<sup>u</sup>* 1 2 ] <sup>2</sup>*<sup>h</sup>* <sup>≡</sup> *<sup>E</sup>*[*y<sup>t</sup>* ] with *y* = *u* 1 <sup>2</sup> and *t* = 2*h*, we have

$$E[\mathbf{y}^{\prime}] = \frac{\Gamma(2\gamma + q - 1 + t)}{\Gamma(2\gamma + q - 1)} \frac{\Gamma(2\beta - 1 - t)}{\Gamma(2\beta - 1)}.\tag{\text{v}}$$

As *t* is arbitrary in *(v)*, the moment expression will uniquely determine the density of *y*. Accordingly, *y* has a real scalar type-2 beta distribution with the parameters *(*2*γ* + *q* − 1*,* 2*β* − 1*)*, and so, its density denoted by *f (y),* is

$$\begin{split} f(\mathbf{y})\mathbf{d}\mathbf{y} &= \frac{\Gamma(2\mathbf{y} + q + 2\boldsymbol{\beta} - 2)}{\Gamma(2\mathbf{y} + q - 1)\Gamma(2\boldsymbol{\beta} - 1)} \mathbf{y}^{2\mathbf{y} + q - 2} (1 + \mathbf{y})^{-(2\mathbf{y} + q + 2\boldsymbol{\beta} - 2)} \mathbf{d}\mathbf{y} \\ &= \frac{\Gamma(2\mathbf{y} + q + 2\boldsymbol{\beta} - 2)}{\Gamma(2\mathbf{y} + q - 1)\Gamma(2\boldsymbol{\beta} - 1)} \frac{1}{2} \mathbf{u}^{-\frac{1}{2}} \mathbf{u}^{\mathbf{y} + \frac{q}{2} - 1} (1 + \mathbf{u}^{\frac{1}{2}})^{-(2\mathbf{y} + q + 2\boldsymbol{\beta} - 2)} \mathbf{d}\mathbf{u} .\end{split}$$

Thus, the density of *u*, denoted by *g(u)*, is the following:

$$g(u) = \frac{1}{2} \frac{\Gamma(2\gamma + q + 2\beta - 2)}{\Gamma(2\gamma + q - 1)\Gamma(2\beta - 1)} u^{\nu + \frac{q - 1}{2} - 1} (1 + u^{\frac{1}{2}})^{-(2\gamma + q + 2\beta - 2)}$$

for 0 ≤ *u <* ∞, and zero elsewhere, where the original conditions on the parameters remain the same. It can be readily verified that *g(u)* is a density.

#### **7.4.3. A pathway extension in the real case**

Let us relabel *f*10*(X)* as specified in (7.2.16), as *g*13*(X)* in this section:

$$g\_{13}(X) = C\_4 |AXBX'| ^ {\prime} |I - a(1 - \alpha) A^{\frac{1}{2}} XBX' A^{\frac{1}{2}}|^{\frac{\eta}{1 - \alpha}}\tag{7.4.11}$$

for *A > O, B > O, η >* 0*,a>* 0*,α<* 1*,* 1 − *a(*1 − *α)A* 1 2*XBX*- *A* 1 <sup>2</sup> *> O*, and *g*<sup>13</sup> = 0 elsewhere, where *C*<sup>4</sup> is the normalizing constant. Observe that for *α <* 1, *a(*1 − *α) >* 0 and hence the model in (7.4.11) is a generalization of the real rectangular matrix-variate type-1 beta density considered in (7.3.1). When *α <* 1, the normalizing constant *C*<sup>4</sup> is of the form given in (7.2.18). For *α >* 1, we may write 1 − *α* = −*(α* − 1*)*, so that <sup>−</sup>*a(*1−*α)* <sup>=</sup> *a(α*−1*) >* <sup>0</sup>*,α>* 1 in (7.4.11) and the exponent *<sup>η</sup>* <sup>1</sup>−*<sup>α</sup>* changes to <sup>−</sup> *<sup>η</sup> <sup>α</sup>*−<sup>1</sup> ; thus, the model appearing in (7.4.11) becomes the following generalization of the rectangular matrix-variate type-2 beta density given in (7.4.1):

$$g\_{14}(X) = C\_5|AXBX'|^\prime |I + a(a-1)A^{\frac{1}{2}}XBX'A^{\frac{1}{2}}|^{-\frac{\eta}{a-1}}\tag{7.4.12}$$

for *A > O, B > O, η >* 0*,a>* 0*,α>* 1 and *g*<sup>14</sup> = 0 elsewhere. The normalizing constant *C*<sup>5</sup> will then be different from that associated with the type-1 case. Actually, in the type-2 case, the normalizing constant is available from (7.2.19). The model appearing in (7.4.12) is a generalization of the real rectangular matrix-variate type-2 beta model considered in (7.4.1). When *α* → 1, the model in (7.4.11) converges to a generalized form of the real rectangular matrix-variate gamma model in (7.2.5), namely,

$$g\_{1\circ}(X) = C\_6 |AXBX'| ^\vee \mathbf{e}^{-a \text{ } \eta \text{tr}(AXBX')} \tag{7.4.13}$$

Rectangular Matrix-Variate Distributions

where

$$C\_6 = (a\eta)^{p(\mathcal{Y} + \frac{q}{2})} \frac{|A|^{\frac{q}{2}} |B|^{\frac{p}{2}} \Gamma\_p(\frac{q}{2})}{\pi^{\frac{qp}{2}} \Gamma\_p(\mathcal{Y} + \frac{q}{2})} \tag{7.4.14}$$

for *a >* <sup>0</sup>*,η>* <sup>0</sup>*, A > O, B > O, (γ* <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and *g*<sup>15</sup> = 0 elsewhere. More properties of the model given in (7.4.11) have already been provided in Sect. 4.2. The real rectangular matrix-variate pathway model was introduced in Mathai (2005).

#### **7.4a. Complex Rectangular Matrix-Variate Type-2 Beta Density**

Let us consider a full rank *p* × *q, q* ≥ *p,* matrix *X*˜ in the complex domain and the following associated density:

$$\tilde{g}\_8(\tilde{X}) = \tilde{C}\_3 \left| \det(A \tilde{X} B \tilde{X}^\*) \right|^\gamma \left| \det(I + A \tilde{X} B \tilde{X}^\*) \right|^{-(\beta + \gamma + q)} \tag{7.4a.1}$$

for *A > O, B > O, (β) > p* − 1*, (γ* + *q) > p* − 1*,* and *g*˜<sup>8</sup> = 0 elsewhere, where *C*˜<sup>3</sup> is the normalizing constant. Let

$$
\tilde{Y} = A^{\frac{1}{2}} \tilde{X} B^{\frac{1}{2}} \Rightarrow \mathrm{d}\tilde{Y} = |\det(A)|^q |\det(B)|^p \mathrm{d}\tilde{X},
$$

and make the transformation

$$\tilde{S} = \tilde{Y}\tilde{Y}^\* \Rightarrow \mathrm{d}\tilde{Y} = \frac{\pi^{qp}}{\tilde{\Gamma}\_p(q)} |\det(\tilde{S})|^{q-p} \mathrm{d}\tilde{S}.$$

Then, the integral over *S*˜ can be evaluated by means of a complex matrix-variate type-2 beta integral. That is,

$$\int\_{\tilde{\mathcal{S}}} |\det(\tilde{\mathcal{S}})|^{\gamma+q-p} |\det(I+\tilde{\mathcal{S}})|^{-(\beta+\gamma+q)} d\tilde{\mathcal{S}} = \frac{\bar{\Gamma}\_p(\gamma+q)\bar{\Gamma}\_p(\beta)}{\tilde{\Gamma}\_p(\gamma+q+\beta)}\tag{7.4a.2}$$

for *(β) > p*−1*, (γ* +*q) > p*−1. The normalizing constant *C*˜<sup>3</sup> as well as the densities of *Y*˜ and *S*˜ can be determined from the previous steps. The normalizing constant is

$$\tilde{C}\_3 = |\det(A)|^q |\det(B)|^p \frac{\Gamma\_p(q)}{\pi^{qp}} \frac{\Gamma\_p(\chi + q + \beta)}{\tilde{\Gamma}\_p(\chi + q)\tilde{\Gamma}\_p(\beta)}\tag{7.4a.3}$$

for *(β) > p* − 1*, (γ* + *q) > p* − 1. The density of *Y*˜, denoted by *g*˜9*(Y )*˜ , is given by

$$\tilde{\mathfrak{g}}\_{9}(\tilde{Y}) = \frac{\tilde{\Gamma}\_{p}(q)}{\pi^{qp}} \frac{\tilde{\Gamma}\_{p}(\chi+q+\beta)}{\tilde{\Gamma}\_{p}(\chi+q)\tilde{\Gamma}\_{p}(\beta)} |\det(\tilde{Y}\tilde{Y}^\*)|^{p} |\det(I+\tilde{Y}\tilde{Y}^\*)|^{-(\chi+q+\beta)}\tag{7.4a.4}$$

for *(β) > p* − 1*, (γ* + *q) > p* − 1, and *g*˜<sup>9</sup> = 0 elsewhere. The density of *S*˜, denoted by *g*˜10*(S)*˜ , is the following:

$$\tilde{g}\_{10}(\tilde{S}) = \frac{\bar{\Gamma}\_p(\chi + q + \beta)}{\tilde{\Gamma}\_p(\chi + q)\tilde{\Gamma}\_p(\beta)} |\det(\tilde{S})|^{\mathcal{V} + q - p} |\det(I + \tilde{S})|^{-(\beta + \gamma + q)} \tag{7.4a.5}$$

for *(β) > p* − 1*, (γ* + *q) > p* − 1*,* and *g*˜<sup>10</sup> = 0 elsewhere.

#### **7.4a.1. Multivariate complex type-2 beta density**

As in Sect. 7.3a, let us consider the special case *p* = 1 in (7.4*a*.1). So, let the 1 × 1 matrix *A* be denoted by *b >* 0 and the 1 × *q* vector *X*˜ = *(x*˜1*,..., x*˜*<sup>q</sup> )*. Then,

$$A\tilde{X}B\tilde{X}^\* = b\tilde{X}B\tilde{X}^\* = b(\tilde{\mathbf{x}}\_1, \dots, \tilde{\mathbf{x}}\_q)B\begin{pmatrix} \tilde{\mathbf{x}}\_1^\* \\ \vdots \\ \tilde{\mathbf{x}}\_q^\* \end{pmatrix} \equiv b\tilde{U} \tag{a}$$

where, in the case of a scalar, an asterisk only designates a complex conjugate. Note that when *p* = 1, *U*˜ = *XB*˜ *X*˜ <sup>∗</sup> is a positive definite Hermitian form whose density, denoted by *g*˜11*(U )*˜ , is obtained as:

$$\tilde{g}\_{11}(\tilde{U}) = b^{\prime + q} |\det(B)| \frac{\bar{\Gamma}(q)}{\pi^q} \frac{\bar{\Gamma}(\gamma + q + \beta)}{\tilde{\Gamma}(\gamma + q)\tilde{\Gamma}(\beta)} |\det(\tilde{U})|^{\prime} |\det(I + b\tilde{U})|^{-(\gamma + q + \beta)} \tag{7.4a.6}$$

for *(β) >* <sup>0</sup>*, (γ* <sup>+</sup> *q) >* 0 and *b >* 0, and *<sup>g</sup>*˜<sup>11</sup> <sup>=</sup> 0 elsewhere. Now, letting *XB*˜ <sup>1</sup> <sup>2</sup> = *Y*˜ = *(y*˜1*,..., y*˜*<sup>q</sup> )*, the density of *Y*˜, denoted by *g*˜12*(Y )*˜ , is obtained as

$$\tilde{g}\_{12}(\tilde{Y}) = b^{\mathcal{V}+q} \frac{\Gamma(q)}{\pi^q} \frac{\Gamma(\mathcal{Y}+q+\beta)}{\tilde{\Gamma}(\mathcal{Y}+q)\tilde{\Gamma}(\beta)} [|\tilde{\mathbf{y}}\_1|^2 + \dots + |\tilde{\mathbf{y}}\_q|^2]^{\mathcal{V}} [1 + b(|\tilde{\mathbf{y}}\_1|^2 + \dots + |\tilde{\mathbf{y}}\_q|^2)]^{-(\mathcal{V}+q+\beta)}\tag{7.4a.7}$$

for *(β) >* 0*, (γ* + *q) >* 0*,b>* 0, and *g*˜<sup>12</sup> = 0 elsewhere. The constant in (7.4*a*.7) can be verified to be a normalizing constant, either by making use of Theorem 4.2a.3 or a *(*2*n)*-variate real polar coordinate transformation, which is left as an exercise to the reader.

**Example 7.4a.1.** Provide an explicit representation of the complex multivariate density in (7.4*a*.7) for *p* = 2*, γ* = 2*, q* = 3*, b* = 3 and *β* = 2.

**Solution 7.4a.1.** The normalizing constant, denoted by *c*˜, is the following:

$$\begin{split} \tilde{c} &= b^{\gamma+q} \frac{\Gamma(q)}{\pi^{q}} \frac{\Gamma(\gamma+q+\beta)}{\tilde{\Gamma}(\gamma+q)\tilde{\Gamma}(\beta)} \\ &= 3^{5} \frac{\Gamma(3)}{\pi^{3}} \frac{\Gamma(7)}{\Gamma(5)\Gamma(2)} = 3^{5} \frac{(2!)}{\pi^{3}} \frac{(6!)}{(4!)(1!)} = \frac{(60)3^{5}}{\pi^{3}}. \end{split} \tag{6}$$

Letting *y*˜<sup>1</sup> = *y*<sup>11</sup> + *iy*12*, y*˜<sup>2</sup> = *y*<sup>21</sup> + *iy*22*, y*˜<sup>3</sup> = *y*<sup>31</sup> + *iy*32*, y*1*<sup>j</sup> , y*2*<sup>j</sup> , y*3*<sup>j</sup> , j* = 1*,* 2*,* being real and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*,

$$\mathcal{Q} = |\tilde{\mathbf{y}}\_1|^2 + |\tilde{\mathbf{y}}\_2|^2 + |\tilde{\mathbf{y}}\_3|^2 = (\mathbf{y}\_{11}^2 + \mathbf{y}\_{12}^2) + (\mathbf{y}\_{21}^2 + \mathbf{y}\_{22}^2) + (\mathbf{y}\_{31}^2 + \mathbf{y}\_{32}^2). \tag{ii}$$

Thus, the required density, denoted by *g*˜12*(Y )*˜ as in (7.4*a*.7), is given by

$$\tilde{g}\_{12}(\tilde{Y}) = \tilde{c}Q^2(1+3Q)^{-7}, \ -\infty < \mathbf{y}\_{lj} < \infty, \ i = 1,2,3, \ j = 1,2.$$

where *c*˜ is specified in *(i)* and *Q*, in *(ii)*. This completes the solution.

The density in (7.4*a*.6) is called a complex multivariate type-2 beta density in the general form and (7.4*a*.7) is referred to as a complex multivariate type-2 beta density in its standard form. Observe that these constitute only one form of the multivariate case of a type-2 beta density. When extending a univariate function to a multivariate one, there is no such thing as a unique multivariate analogue. There exist a multitude of multivariate functions corresponding to specified marginal functions, or marginal densities in statistical problems. In the latter case for instance, there are countless possible copulas associated with some specified marginal distributions. Copulas actually encapsulate the various dependence relationships existing between random variables. We have already seen that one set of generalizations to the multivariate case for univariate type-1 and type-2 beta densities are the type-1 and type-2 Dirichlet densities and their extensions. The densities appearing in (7.4*a*.6) and (7.4*a*.7) are yet another version of a multivariate type-2 beta density in the complex case.

What will be the resulting distribution when *q* = 1 in (7.4*a*.7)? The standard form of this density then becomes the following, denoted by *g*˜13*(y*˜1*)*:

$$\tilde{g}\_{13}(\tilde{\mathbf{y}}\_1) = b^{\mathcal{V}+1} \frac{1}{\pi} \frac{\Gamma(\mathcal{Y}+\mathcal{Y}+1)}{\Gamma(\mathcal{Y}+1)\Gamma(\beta)} [|\tilde{\mathbf{y}}\_1|^2]^{\mathcal{V}} [1+b|\tilde{\mathbf{y}}\_1|^2]^{-(\mathcal{V}+1+\beta)} \tag{7.4a.8}$$

for *(β) >* 0*, (γ* + 1*) >* 0*,b>* 0*,* and *g*˜<sup>13</sup> = 0 elsewhere. We now verify that this is indeed a density function. Let *y*˜<sup>1</sup> = *y*<sup>11</sup> + *iy*12*, y*<sup>11</sup> and *y*<sup>12</sup> being real scalar quantities and *<sup>i</sup>* <sup>=</sup> <sup>√</sup>*(*−1*)*. When *<sup>y</sup>*˜<sup>1</sup> is in the complex plane, −∞ *< y*1*<sup>j</sup> <sup>&</sup>lt;* <sup>∞</sup>*, j* <sup>=</sup> <sup>1</sup>*,* 2. Let us make a polar coordinate transformation. Letting *y*<sup>11</sup> = *r* cos *θ* and *y*<sup>12</sup> = *r* sin *θ*, d*y*<sup>11</sup> ∧ d*y*<sup>12</sup> = *r* d*r* ∧ d*θ*, 0 ≤ *r <* ∞*,* 0 *< θ* ≤ 2*π*. The integral over the functional part of (7.4*a*.8) yields

$$\begin{split} \int\_{\tilde{\mathbf{y}}\_{1}} |\tilde{\mathbf{y}}\_{1}|^{2^{\mathsf{V}}} [1 + |\tilde{\mathbf{y}}\_{1}|^{2}]^{-(\mathsf{y} + 1 + \beta)} \, \mathrm{d}\tilde{\mathbf{y}}\_{1} &= \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} [\mathbf{y}\_{11}^{2} + \mathbf{y}\_{12}^{2}]^{\mathsf{V}} \\ &\quad \times [1 + b(\mathbf{y}\_{11}^{2} + \mathbf{y}\_{12}^{2})]^{-(\mathsf{y} + 1 + \beta)} \mathrm{d}\mathbf{y}\_{11} \wedge \, \mathrm{d}\mathbf{y}\_{12} \\ &= \int\_{\theta=0}^{2\pi} \int\_{r=0}^{\infty} [r^{2}]^{\mathsf{V}} [1 + br^{2}]^{-(\mathsf{y} + 1 + \beta)} r \, \mathrm{d}\theta \wedge \mathrm{d}r \\ &= (2\pi) \left(\frac{1}{2}\right) \int\_{t=0}^{\infty} t^{\mathsf{V}} (1 + bt)^{-(\mathsf{y} + 1 + \beta)} \mathrm{d}t \end{split}$$

which is equal to

$$
\pi \, b^{-(\gamma+1)} \frac{\Gamma(\gamma+1)\Gamma(\beta)}{\Gamma(\gamma+1+\beta)}.
$$

This establishes that the function specified by (7.4*a*.8) is a density.

#### **7.4a.2. Arbitrary moments in the complex type-2 beta density**

Let us consider the *h*-th moment of |det*(U )*˜ |=|det*(A* 1 <sup>2</sup>*XB*˜ *X*˜ <sup>∗</sup>*A* 1 <sup>2</sup> *)*| in (7.4*a*.1). Since the only change upon integration is that *γ* is replaced by *γ* +*h*, the *h*-th moment is available from the normalizing constant in (7.4*a*.2):

$$E[|\det(\tilde{U})|^h] = \frac{\Gamma\_p(\chi + q + h)}{\Gamma\_p(\chi + q)} \frac{\Gamma\_p(\beta - h)}{\Gamma\_p(\beta)}\tag{7.4a.9}$$

$$=\prod\_{j=1}^{p} \frac{\Gamma(\wp + q - (j - 1) + h)}{\Gamma(\wp + q - (j - 1))} \frac{\Gamma(\beta - (j - 1) - h)}{\Gamma(\beta - (j - 1))} \tag{7.4a.10}$$

$$=E\{\boldsymbol{u}\_1^h\}\cdots\boldsymbol{E}\{\boldsymbol{u}\_p^h\}\tag{7.4a.11}$$

where *u*1*,...,up* mutually independently distributed real scalar type-2 beta random variables with the parameters *(γ* + *q* − *(j* − 1*), β* − *(j* − 1*)), j* = 1*,...,p*. Thus, |det*(U )*˜ | has the structural representation

$$|\det(\tilde{U})| = |\det(A^{\frac{1}{2}}\tilde{X}B\tilde{X}^\*A^{\frac{1}{2}})| = |\det(\tilde{Y}\tilde{Y}^\*)| = |\det(\tilde{S})| = u\_1 \cdots u\_p \tag{7.4a.12}$$

where the *u*1*,...,up* are as previously defined. The density for a complex scalar type-2 beta random variable is provided in (7.4*a*.8).

### **7.4a.3. A pathway version of the complex rectangular matrix-variate type-1 beta density**

Consider the model specified in (7.2*a*.12), that is,

$$\tilde{g}\_{14}(\tilde{X}) = \tilde{C}\_4 |\det(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^\prime |\det(I - a(1 - a) A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^{\frac{\eta}{1 - a}}\tag{7.4a.13}$$

for *a >* 0*,α<* 1*, A > O, B > O, η >* 0*, I* −*a(*1−*α)A* 1 <sup>2</sup>*XB*˜ *X*˜ <sup>∗</sup>*A* 1 <sup>2</sup> *> O,* and *g*˜<sup>14</sup> = 0 elsewhere, where *C*˜<sup>4</sup> is the normalizing constant given in (7.2*a*.15). When *α <* 1, the model appearing in (7.4*a*.13) is a generalization of the complex rectangular matrix-variate type-1 beta model considered in (7.3*a*.1). When *α >* 1, we write 1−*α* = −*(α*−1*), α >* 1 and re-express *g*˜<sup>14</sup> as

$$\tilde{g}\_{15}(\tilde{X}) = \tilde{C}\_5 |\det(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^\prime |\det(I + a(\alpha - 1) A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^{-\frac{\eta}{\alpha - 1}} \tag{7.4a.14}$$

for *a >* 0*,α>* 1*,η>* 0*, A > O, B > O,* and *g*˜<sup>15</sup> = 0 elsewhere, where the normalizing constant *C*˜<sup>5</sup> is the same as the one in (7.2*a*.16). Observe that the model in (7.4*a*.14) is a generalization of the complex rectangular matrix-variate type-2 beta model in (7.4.1). When *q* → 1*,* the models in (7.4*a*.13) and (7.4*a*.14) both converge to the following model:

$$\tilde{g}\_{16}(\tilde{X}) = \tilde{C}\_6 |\det(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})|^\mathcal{V} \mathbf{e}^{-a \, \eta \, \text{tr}(A^{\frac{1}{2}} \tilde{X} B \tilde{X}^\* A^{\frac{1}{2}})} \tag{7.4a.15}$$

for *a >* 0*,η>* 0*, A > O, B > O,* and *g*˜<sup>16</sup> = 0 elsewhere, where the normalizing constant *C*˜<sup>6</sup> is the same as that in (7.2*a*.17). The model specified in (7.4*a*.15) is a generalization of the complex rectangular matrix-variate gamma model considered in (7.2*a*.1). Thus, model in (7.4*a*.13) contains all the three models (7.4*a*.13), (7.4*a*.14), and (7.4*a*.15), which are generalizations of the models given in (7.3*a*.1), (7.4*a*.1), and (7.2*a*.1), respectively. The pathway model in the complex domain, namely (7.4*a*.13), was introduced in Mathai and Provost (2006). Additional properties of the pathway model have already been discussed in Sect. 7.2a.

#### **Exercises 7.4**

**7.4.1.** Following the instructions or otherwise, derive the normalizing constant *C*˜<sup>3</sup> in (7.4*a*.3).

**7.4.2.** By integrating over *Y*˜, show that (7.4*a*.4) is a density.

**7.4.3.** Evaluate the normalizing constant in (7.4*a*.7) by using (1): Theorem 4.2a.3; (2): a *(*2*n)*-variate real polar coordinate transformation.

**7.4.4.** Given the standard real matrix-variate type-2 beta model in (7.4.5), evaluate the marginal joint density of *y*1*,...,yr, r<p*.

**7.4.5.** Evaluate the density in (7.4*a*.4) explicitly for *p* = 1 and *q* = 2.

**7.4.6.** Given the standard complex matrix-variate type-2 beta model in (7.4*a*.7), evaluate the joint marginal density of *y*˜1*,..., y*˜*r, r<p*.

**7.4.7.** Derive the density of |det*(U )*˜ | in (7.4*a*.12) for the cases (1): *p* = 1; (2): *p* = 2.

**7.4.8.** Derive the density of |*U*| in (7.4.10) for (1): *p* = 2; (2): *p* = 3.

#### **7.5,7.5a. Ratios Involving Rectangular Matrix-Variate Random Variables**

Since scalar variables such as type-1 beta, type-2 beta, F, Student-*t* and Cauchy variables are all associated with ratios of independently distributed random variables, we will explore ratios involving rectangular matrix-variate random variables. Such ratios will yield the rectangular matrix-variate versions of the aforementioned ratios of scalar variables. Let the *p* × *n*1*, p* ≤ *n*1*,* full rank matrix *X*<sup>1</sup> and the *p* × *n*2*, p* ≤ *n*2*,* full rank matrix *X*<sup>2</sup> be independently distributed real matrix-variate random variables having the rectangular matrix-variate gamma densities specified in (7.2.5), that is,

$$f\_j(X\_j) = \frac{|A\_j|^{\frac{n\_j}{2}} |B\_j|^{\frac{p}{2}} \Gamma\_p(\frac{n\_j}{2})}{\pi^{\frac{n\_j p}{2}} \Gamma\_p(\chi\_j + \frac{n\_j}{2})} |A\_j X\_j B\_j X\_j'|^{\mathcal{V}j} \mathbf{e}^{-\text{tr}(A\_j X\_j B\_j X\_j')}, \ j = 1, 2,\tag{7.5.1}$$

for *Aj > O, Bj > O, (γj* <sup>+</sup> *nj* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *,* and *nj* ≥ *p* where *Aj* is *p* × *p* and *Bj* is *nj* × *nj* . Then, owing to the statistical independence of the variables, the joint density of *X*<sup>1</sup> and *X*<sup>2</sup> is *f (X*1*, X*2*)* = *f*1*(X*1*)f*2*(X*2*).* Consider the ratios

$$U\_1 = \left(\sum\_{j=1}^2 \left(A\_j^{\frac{1}{2}} X\_j \mathcal{B}\_j X\_j^\prime A\_j^{\frac{1}{2}}\right)\right)^{-\frac{1}{2}} \left(A\_1^{\frac{1}{2}} X\_1 \mathcal{B}\_1 X\_1^\prime A\_1^{\frac{1}{2}}\right) \left(\sum\_{j=1}^2 \left(A\_j^{\frac{1}{2}} X\_j \mathcal{B}\_j X\_j^\prime A\_j^{\frac{1}{2}}\right)\right)^{-\frac{1}{2}} \quad (i)$$

and

$$U\_2 = \left(A\_2^{\frac{1}{2}} X\_2 B\_2 X\_2' A\_2^{\frac{1}{2}}\right)^{-\frac{1}{2}} \left(A\_1^{\frac{1}{2}} X\_1 B\_1 X\_1' A\_1^{\frac{1}{2}}\right) \left(A\_2^{\frac{1}{2}} X\_2 B\_2 X\_2' A\_2^{\frac{1}{2}}\right)^{-\frac{1}{2}}.\tag{ii}$$

Let us derive the densities of *U*<sup>1</sup> and *U*2. Letting *Vj* = *A* 1 2 *<sup>j</sup> XjB* 1 2 *<sup>j</sup> ,* we have d*Xj* = |*Aj* | −*nj* <sup>2</sup> |*Bj* | −*p* <sup>2</sup> d*Vj* . Denoting the joint density of *V*<sup>1</sup> and *V*<sup>2</sup> by *g(V*1*, V*2*)*, it follows that *f (X*1*, X*2*)*d*X*<sup>1</sup> ∧ d*X*<sup>2</sup> = *g(V*1*, V*2*)*d*V*<sup>1</sup> ∧ d*V*<sup>2</sup> and so,

$$\log(V\_1, V\_2) \mathrm{d}V\_1 \wedge \mathrm{d}V\_2 = \left\{ \prod\_{j=1}^2 \frac{\Gamma\_p(\frac{n\_j}{2})}{\pi^{\frac{n\_j p}{2}} \Gamma\_p(\chi\_j + \frac{n\_j}{2})} \right\} |V\_1 V\_1'|^{\mathbb{M}} |V\_2 V\_2'|^{\mathbb{M}} \mathrm{e}^{-\mathrm{tr}(V\_1 V\_1' + V\_2 V\_2')} \mathrm{d}V\_1 \wedge \mathrm{d}V\_2.$$

Letting *Wj* = *VjV* - *<sup>j</sup>* , d*Vj* <sup>=</sup> *<sup>π</sup> nj p* 2 *Γp( nj* 2 *)* |*Wj* | *nj* <sup>2</sup> <sup>−</sup>*p*+<sup>1</sup> <sup>2</sup> d*Wj ,* and the joint density of *W*<sup>1</sup> and *W*2, denoted by *h(W*1*, W*2*)*, is the following:

$$h(W\_1, W\_2) = \left\{ \prod\_{j=1}^2 \frac{1}{\Gamma\_p(\chi\_j + \frac{n\_j}{2})} \right\} |W\_1|^{\mathcal{V}\_1 + \frac{n\_1}{2} - \frac{p+1}{2}} |W\_2|^{\mathcal{V}\_2 + \frac{n\_2}{2} - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(W\_1 + W\_2)}. \tag{7.5.2}$$

Note that *U*<sup>1</sup> = *(W*<sup>1</sup> + *W*2*)* −1 <sup>2</sup>*W*1*(W*<sup>1</sup> + *W*2*)* −1 <sup>2</sup> and *<sup>U</sup>*<sup>2</sup> <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 <sup>2</sup> . Then, given the relationship between independently distributed matrix-variate gamma variables and a type-1 matrix-variate beta variable and a type-2 matrix-variate beta variable, *U*<sup>1</sup> and *U*<sup>2</sup> are distributed as real matrix-variate type-1 beta and type-2 beta random variables, respectively, both with the parameters *(γ*<sup>1</sup> <sup>+</sup> *<sup>n</sup>*<sup>1</sup> <sup>2</sup> *, γ*<sup>2</sup> <sup>+</sup> *<sup>n</sup>*<sup>2</sup> <sup>2</sup> *)*, that is,

$$U\_1 \sim \text{ type-1}\text{ beta}\left(\gamma\_1 + \frac{n\_1}{2}, \gamma\_2 + \frac{n\_2}{2}\right) \text{ and } \; U\_2 \sim \text{ type-2}\text{ beta}\left(\gamma\_1 + \frac{n\_1}{2}, \gamma\_2 + \frac{n\_2}{2}\right).$$

Thus, we have the following result:

**Theorem 7.5.1.** *Let X*<sup>1</sup> *of dimension p*×*n*1*, p* ≤ *n*1*, and X*<sup>2</sup> *of dimension p*×*n*2*, p* ≤ *n*2*, be rank p matrices that are independently distributed rectangular real matrix-variate gamma random variables whose densities are specified in (7.5.1). Then, as defined in (i) and (ii), U*<sup>1</sup> *and U*<sup>2</sup> *are respectively real matrix-variate type-1 beta and type-2 beta distributed with the same parameters (γ*<sup>1</sup> <sup>+</sup> *<sup>n</sup>*<sup>1</sup> <sup>2</sup> *, γ*<sup>2</sup> <sup>+</sup> *<sup>n</sup>*<sup>2</sup> <sup>2</sup> *). Thus they have the following densities, denoted by gj (Uj ), j* = 1*,* 2 *:*

$$\lg\_1(U\_1)\mathrm{d}U\_1 = c \left| U\_1 \right|^{\gamma\_1 + \frac{n\_1}{2} - \frac{p+1}{2}} |I - U\_1|^{\gamma\_2 + \frac{n\_2}{2} - \frac{p+1}{2}} \mathrm{d}U\_1, \; O < U\_1 < I,\tag{7.5.3}$$

*and zero elsewhere, and*

$$\lg\_2(U\_2)\mathrm{d}U\_2 = c \left| U\_2 \right|^{\gamma\_1 + \frac{n\_1}{2} - \frac{p+1}{2}} |I + U\_2|^{-(\gamma\_1 + \gamma\_2 + \frac{n\_1}{2} + \frac{n\_2}{2})} \mathrm{d}U\_2, \; U\_2 > O,\tag{7.5.4}$$

*where*

$$c = \frac{\Gamma\_p(\wp\_1 + \wp\_2 + \frac{n\_1}{2} + \frac{n\_2}{2})}{\Gamma\_p(\wp\_1 + \frac{n\_1}{2})\Gamma\_p(\wp\_2 + \frac{n\_2}{2})}, \ \Re\left(\wp\_j + \frac{n\_j}{2}\right) > \frac{p-1}{2}, \ j = 1, 2.$$

Analogous derivations will yield the densities of *U*˜<sup>1</sup> and *U*˜2, the corresponding matrixvariate random variables in the complex domain:

**Theorem 7.5a.1.** *Let X*˜ <sup>1</sup> *of dimension p*×*n*1*, p* ≤ *n*1*, and X*˜ <sup>2</sup> *of dimension p*×*n*2*, p* ≤ *n*2*, be full rank rectangular matrix-variate complex gamma random variables that are independently distributed whose densities are*

$$\tilde{f}\_{j}(\tilde{X}\_{j})\mathrm{d}\tilde{X}\_{j} = \frac{|A\_{j}|^{n\_{j}}|B\_{j}|^{p}\tilde{\Gamma}\_{p}(n\_{j})}{\pi^{n\_{j}p}\tilde{\Gamma}\_{p}(\nu\_{j} + n\_{j})}|\mathrm{det}(A\_{j}\tilde{X}\_{j}B\_{j}\tilde{X}\_{j}^{\*})|^{p\_{j}}\mathrm{e}^{-\mathrm{tr}(A\_{j}\tilde{X}\_{j}B\_{j}\tilde{X}\_{j}^{\*})}\mathrm{d}\tilde{X}\_{j} \quad (7.5a.1)$$

*where Aj* = *A*<sup>∗</sup> *<sup>j</sup> > O and Bj* = *B*<sup>∗</sup> *<sup>j</sup> > O with Aj being p* × *p and Bj , nj* × *nj , p* ≤ *nj , j* = 1*,* 2*. Letting*

$$\tilde{U}\_1 = \left(\sum\_{j=1}^2 A\_j^{\frac{1}{2}} \tilde{X}\_j B\_j \tilde{X}\_j^\* A\_j^{\frac{1}{2}}\right)^{-\frac{1}{2}} (A\_1^{\frac{1}{2}} \tilde{X}\_1 B\_1 \tilde{X}\_1^\* A\_1^{\frac{1}{2}}) \left(\sum\_{j=1}^2 A\_j^{\frac{1}{2}} \tilde{X}\_j B\_j \tilde{X}\_j^\* A\_j^{\frac{1}{2}}\right)^{-\frac{1}{2}}$$

*and*

$$
\tilde{U}\_2 = (A\_2^{\frac{1}{2}} \tilde{X}\_2 B\_2 \tilde{X}\_2^\* A\_2^{\frac{1}{2}})^{-\frac{1}{2}} (A\_1^{\frac{1}{2}} \tilde{X}\_1 B\_1 \tilde{X}\_1^\* A\_1^{\frac{1}{2}}) (A\_2^{\frac{1}{2}} \tilde{X}\_2 B\_2 \tilde{X}\_2^\* A\_2^{\frac{1}{2}})^{-\frac{1}{2}},\tag{7.5a.2}
$$

*the densities of U*˜<sup>1</sup> *and U*˜2*, denoted by g*˜*<sup>j</sup> (U*˜ *<sup>j</sup> ), j* = 1*,* 2*, are respectively given by*

$$\tilde{\mathcal{g}}\_1(\tilde{U}\_1)\mathbf{d}\tilde{U}\_1 = \tilde{c} \, |\det(\tilde{U}\_1)|^{\gamma\_1 + n\_1 - p} |\det(I - \tilde{U}\_1)|^{\gamma\_2 + n\_2 - p} \mathbf{d}\tilde{U}\_1, \ \ \ 0 < \tilde{U}\_1 < I,\tag{7.5a.3}$$

*and*

$$\tilde{\text{g}}\_2(\tilde{U}\_2)\text{d}\tilde{U}\_2 = \tilde{c}\,|\det(\tilde{U}\_2)|^{\gamma\_1 + n\_1 - p} |\text{det}(I + \tilde{U}\_2)|^{-(\gamma\_1 + \gamma\_2 + n\_1 + n\_2)} \text{d}\tilde{U}\_2, \ \tilde{U}\_2 > O,\tag{7.5a.4}$$

*where*

$$\tilde{c} = \frac{\tilde{\Gamma}\_p(\chi\_1 + \chi\_2 + n\_1 + n\_2)}{\tilde{\Gamma}\_p(\chi\_1 + n\_1)\tilde{\Gamma}\_p(\chi\_2 + n\_2)}, \ \Re(\chi\_j + n\_j) > p - 1, \ j = 1, 2.$$

The densities specified in (7.5.4) and (7.5*a*.4) happen to be quite useful in real-life applications. Connections of the type-2 beta distribution to the F-distribution, the Student*t*<sup>2</sup> distribution and the distribution of the sample correlation coefficient when the population is Gaussian, have already been pointed out in the course of our previous discussions with respect to the scalar, vector variable and matrix-variate cases. Some further relationships are next pointed out. Let {*Y*1*,...,Yn*} constitutes a simple random sample where *Yj iid* ∼ *Np(μ, Σ), Σ > O, j* = 1*, . . . , n,* and the sample matrix be denoted by **<sup>Y</sup>** = [*Y*1*, Y*2*,...,Yn*]; letting *<sup>Y</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>*[*Y*<sup>1</sup> + ··· + *Yn*] and the matrix of sample means be **Y**¯ = [*Y,...,* ¯ *Y*¯], the sample sum of products (corrected) matrix is *S* = *(***Y** − **Y**¯ *)(***Y** − **Y**¯ *)*- , which is unaffected by *μ*. We have determined that *S* follows a real Wishart distribution having *m* = *n* − 1 degrees of freedom, and that when *μ* is known to be a null vector, **YY** is real Wishart matrix with *n* degrees of freedom. Now, consider *A* 1 <sup>2</sup> *Yj iid* ∼ *Np(A* 1 <sup>2</sup>*μ, A* 1 <sup>2</sup>*ΣA* 1 <sup>2</sup> *)*; when *μ* = *O*, the sample sum of products matrix is *A* 1 <sup>2</sup> *Y Y* - *A* 1 <sup>2</sup> , which can be expressed in the form of the product of matrices appearing in (7.5.1) with *B* = *I* . Hence, we can regard *A* 1 2*XBX*- *A* 1 <sup>2</sup> (or equivalently *AXBX* in the determinant in (7.5.1)) as a weighted sample sum of products matrix with sample sizes *n*<sup>1</sup> and *n*2. Then, the type-2 beta density in (7.5.4) with *U*<sup>2</sup> replaced by *<sup>n</sup>*<sup>1</sup> *n*2 *U*<sup>2</sup> corresponds to a generalized *real rectangular matrix-variate F-density having n*<sup>1</sup> *and n*<sup>2</sup> *degrees of freedom* where *U*<sup>2</sup> is as defined in *(ii)*. Moreover, for *γ*<sup>1</sup> = 0 = *γ*2, this density will correspond to a *rectangular matrix-variate Student-t density*. The material included in Sect. 7.5,7.5a may not be available in the literature.

#### **7.5.1. Multivariate F, Student-***t* **and Cauchy densities**

The densities appearing in (7.5.4) and (7.5*a*.4) for the *p* × *p* positive definite matrices *U*<sup>2</sup> and *U*˜<sup>2</sup> have d*U*<sup>2</sup> and d*U*˜<sup>2</sup> as differential elements. A positive definite matrix such as *U*2, can be expressed as *U*<sup>2</sup> = *T T* where *T* of dimension *p* × *n*1*, p* ≤ *n*1*,* has rank *p*, and we can write d*U*<sup>2</sup> in terms of d*T* . We can also consider the format *U*<sup>2</sup> = *TCT* - where *C>O* is an *n*<sup>1</sup> ×*n*<sup>1</sup> positive definite constant matrix. In other words, we can arrive at the format in (7.4.1) from (7.5.4), and correspondingly obtain (7.4*a*.1) from (7.5*a*.4). Let us re-examine the expressions given in (7.4.1) and (7.4*a*.1), which could be referred to as *rectangular matrix-variate F and Student-t densities in the real and complex cases* for specific values of the parameters *β* and *γ* . Now, let *p* = 1 and *A* = *a >* 0 in (7.4.1) wherein a location parameter vector *μ* is inserted. The resulting density is

$$h(X)\mathrm{d}X = \frac{a^{\prime + \frac{q}{2}}|\mathcal{B}|^{\frac{1}{2}}\Gamma(\frac{q}{2})\Gamma(\gamma + \frac{q}{2} + \beta)}{\pi^{\frac{q}{2}}\Gamma(\gamma + \frac{q}{2})\Gamma(\beta)}[(X-\mu)\mathcal{B}(X-\mu)']^\nu$$

$$\times [1 + a(X-\mu)\mathcal{B}(X-\mu)']^{-(\gamma + \frac{q}{2} + \beta)}\mathrm{d}X \tag{7.5.5}$$

where *X* and *μ* are 1 × *q* row vectors, the corresponding density in the complex domain, denoted by *h(*˜ *X)*˜ , being the following:

$$
\tilde{h}(\tilde{X})d\tilde{X} = \frac{|a|^{\gamma+q}|\text{det}(\mathcal{B})|\Gamma(q)\Gamma(\gamma+q+\beta)}{\pi^q|\Gamma(\gamma+q)\Gamma(\beta)}[(\tilde{X}-\tilde{\mu})\mathcal{B}(\tilde{X}-\tilde{\mu})^\*]^\prime \\
$$

$$
\times [1 + a(\tilde{X}-\tilde{\mu})\mathcal{B}(\tilde{X}-\tilde{\mu})^\*]^{-(\gamma+q+\beta)}\text{d}\tilde{X}.\tag{7.5a.5}
$$

For specific values of the parameters, the densities appearing in (7.5.5) and (7.5a.5) can be respectively called the *multivariate F and Student-t densities* in the real and complex domains. With a view to model certain types of signal processes, (Kondo et al., 2020) made use of a special form of the complex multivariate Student-*<sup>t</sup>* wherein *<sup>γ</sup>* <sup>=</sup> <sup>0</sup>*, a* <sup>=</sup> <sup>2</sup> *ν* and *<sup>β</sup>* <sup>=</sup> *<sup>ν</sup>* <sup>2</sup> , which is given next.

#### **7.5a.1. A complex multivariate Student-***t* **having** *ν* **degrees of freedom**

$$\tilde{h}\_1(\tilde{X})\mathrm{d}\tilde{X} = \frac{2^q \Gamma(\frac{v}{2} + q)}{(v\pi)^q \Gamma(\frac{v}{2}) |\det(\Sigma)|} \Big[ 1 + \frac{2}{\nu} (\tilde{X} - \tilde{\mu}) \Sigma^{-1} (\tilde{X} - \tilde{\mu})^\* \Big]^{-(\frac{v}{2} + q)} \mathrm{d}\tilde{X}. \quad (7.5a.6)$$

A complex multivariate Cauchy density that, as well, is mentioned in Kondo et al. (2020), can be obtained by letting *ν* = 1 in (7.5*a*.6). We conclude this section with its representation, denoted by *h*˜2*(X)*˜ :

#### **7.5a.2. A complex multivariate Cauchy density**

$$\tilde{h}\_2(\tilde{X}) \text{d}\tilde{X} = \frac{2^q \, \Gamma(\frac{1}{2} + q)}{\pi^q \, \Gamma(\frac{1}{2}) |\text{det}(\boldsymbol{\Sigma})|} [1 + 2(\tilde{X} - \tilde{\mu}) \, \boldsymbol{\Sigma}^{-1} (\tilde{X} - \tilde{\mu})^\* ]^{ - (q + \frac{1}{2})} \text{d}\tilde{X}. \qquad (7.5a.7)$$

#### **Exercises 7.5**

**7.5.1.** Derive the complex densities in (7.5*a*.3) and (7.5*a*.4).

**7.5.2.** Derive the normalizing constant in (7.5*a*.6) by integrating out the functional portion of this density.

**7.5.3.** Derive the normalizing constant in (7.5*a*.7) by integrating out the functional portion of this density.

**7.5.4.** Derive the density in (7.5*a*.6) from complex *q*-variate Gaussian densities.

**7.5.5.** Derive the density in (7.5*a*.7) from complex *q*-variate Gaussian densities.

#### **7.6. Rectangular Matrix-Variate Dirichlet Density, Real Case**

For the real matrix-variate type-1 and type-2 Dirichlet models involving sets of real positive definite matrices, the reader is referred to Sects. 5.8.6 and 5.8.7. The corresponding rectangular matrix-variate cases will be considered in this section. Let *Aj > O, j* = 1*,...,k*, be *p* × *p* real positive definite constant matrices, and *Bj , j* = 1*,... , k,* be *qj* ×*qj* real positive definite constant matrices. Let *Xj , j* = 1*,... , k,* be *p*×*qj , qj* ≥ *p,* rank *p* real matrices whose elements are distinct real scalar variables. Then, consider the real-valued scalar function of *X*1*,...,Xk,*

$$\begin{split} f\_1(X\_1, \ldots, X\_k) &= C\_k |A\_1 X\_1 B\_1 X\_1'| ^{\mathcal{Y}\_1} \cdots |A\_k X\_k B\_k X\_k'| ^{\mathcal{Y}\_k} \\ &\times |I - A\_1^{\frac{1}{2}} X\_1 B\_1 X\_1' A\_1^{\frac{1}{2}} - \cdots - A\_k^{\frac{1}{2}} X\_k B\_k X\_k' A\_k^{\frac{1}{2}}|^{\mathcal{Y}\_{k+1} - \frac{p+1}{2}} \end{split} \tag{7.6.1}$$

for *Aj > O, Bj > O, A* 1 2 *<sup>j</sup> XjBjX*- *jA* 1 2 *<sup>j</sup> > O, j* <sup>=</sup> <sup>1</sup>*,... , k, I* <sup>−</sup>*<sup>k</sup> <sup>j</sup>*=<sup>1</sup> *<sup>A</sup>* 1 2 *<sup>j</sup> XjBjX*- *jA* 1 2 *<sup>j</sup> > O, (γj* <sup>+</sup> *qj* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,... , k,* and *f*<sup>1</sup> = 0 elsewhere, where *Ck* is the normalizing constant. This normalizing constant can be evaluated as follows: Letting

$$\mathbf{Y}\_{j} = A\_{j}^{\frac{1}{2}} \mathbf{X}\_{j} \boldsymbol{\mathcal{B}}\_{j}^{\frac{1}{2}} \Rightarrow \mathbf{d} \mathbf{Y}\_{j} = |\mathbf{A}\_{j}|^{\frac{q\_{j}}{2}} |\mathbf{B}\_{j}|^{\frac{p}{2}} \mathbf{dX}\_{j}, \ j = 1, \ldots, k,\tag{i}$$

the joint density of *Y*1*,...,Yk*, denoted by *f*2*(Y*1*,...,Yk)*, is given by

$$f\_2(Y\_1, \dots, Y\_k) = \left\{ \prod\_{j=1}^k |A\_j|^{-\frac{q\_j}{2}} |B\_j|^{-\frac{p}{2}} \right\} C\_k |Y\_1 Y\_1'|^{\gamma\_1} \dots |Y\_k Y\_k'|^{\gamma\_k}$$

$$\times |I - \sum\_{j=1}^k Y\_j Y\_j'|^{\gamma\_{k+1} - \frac{p+1}{2}}.\tag{7.6.2}$$

Now, let

$$\mathbf{S}\_{j} = Y\_{j}Y\_{j}^{\prime} \Rightarrow \mathbf{d}Y\_{j} = \frac{\pi^{\frac{q\_{j}p}{2}}}{\Gamma\_{p}(\frac{q\_{j}}{2})} |\mathbf{S}\_{j}|^{\frac{q\_{j}}{2} - \frac{p+1}{2}} \mathbf{d}\mathbf{S}\_{j}, \ j = 1, \ldots, k. \tag{ii}$$

Then, the joint density of *S*1*,...,Sk,* which follows, is a real matrix-variate type-1 Dirichlet density:

$$\begin{split} f\_{\mathcal{I}}(S\_1, \dots, S\_k) &= C\_k \left\{ \prod\_{j=1}^k |A\_j|^{-\frac{q\_j}{2}} |B\_j|^{-\frac{p}{2}} \frac{\pi^{\frac{q\_j p}{2}}}{\Gamma\_p(\frac{q\_j}{2})} \right\} \\ &\times \left\{ \prod\_{j=1}^k |S\_j|^{\nu\_j + \frac{q\_j}{2} - \frac{p+1}{2}} \right\} |I - S\_1 - \dots - S\_k|^{\gamma\_{k+1} - \frac{p+1}{2}} \end{split} \tag{7.6.3}$$

for *Sj > O, (γj* <sup>+</sup> *qj* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,...,k*. Next, on integrating out *S*1*,...,Sk,* by making use of a type-1 real matrix-variate Dirichlet integral that was defined in Sect. 5.8.6, we have

$$\frac{\{\prod\_{j=1}^{k} \Gamma\_p(\boldsymbol{\gamma}\_j + \frac{q\_j}{2})\} \Gamma\_p(\boldsymbol{\gamma}\_{k+1})}{\Gamma\_p(\sum\_{j=1}^{k+1} \boldsymbol{\gamma}\_j + \sum\_{j=1}^{k} \frac{q\_j}{2})}, \ \mathfrak{R}(\boldsymbol{\gamma}\_j + \frac{q\_j}{2}) > \frac{p-1}{2}, \ j = 1, \ldots, k,\tag{iii}$$

and *(γk*+1*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> .Then, as obtained from (*i*), (*ii*) and (*iii*), the normalizing constant is

$$\begin{split} C\_{k} &= \left\{ \prod\_{j=1}^{k} |A\_{j}|^{\frac{q\_{j}}{2}} |B\_{j}|^{\frac{p}{2}} \frac{\Gamma\_{p}(\frac{q\_{j}}{2})}{\pi^{\frac{q\_{j}p}{2}}} \frac{1}{\Gamma\_{p}(\chi\_{j} + \frac{q\_{j}}{2})} \right\} \\ &\times \frac{\Gamma\_{p}(\chi\_{1} + \dots + \chi\_{k+1} + \frac{q\_{1}}{2} + \dots + \frac{q\_{k}}{2})}{\{\prod\_{j=1}^{k} \Gamma\_{p}(\chi\_{j} + \frac{q\_{j}}{2})\} \Gamma\_{p}(\chi\_{k+1})} \end{split} \tag{7.6.4}$$

for *Aj > O, Bj > O, qj* <sup>≥</sup> *p, (γj* <sup>+</sup> *qj* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* <sup>=</sup> <sup>1</sup>*,... , k,* and *(γk*+1*) > <sup>p</sup>*−<sup>1</sup> 2 .

#### **7.6.1. Certain properties, real rectangular matrix-variate type-1 Dirichlet density**

Letting *<sup>U</sup>* <sup>=</sup> *<sup>k</sup> <sup>j</sup>*=<sup>1</sup> *<sup>A</sup>* 1 2 *<sup>j</sup> XjBjX*- *jA* 1 2 *<sup>j</sup>* , what might be the distributions of *U* and *I* −*U*? In the real scalar case, one could have easily evaluated the moments *<sup>E</sup>*[*(*1−*u)h*] for arbitrary *h,* which would have automatically determined the distribution of 1 − *u*, and therefrom that of *u*. In the matrix-variate case as well, one can readily determine the *h*-th moment of the determinant of *I* − *U*, *E*[|*I* − *U*| *<sup>h</sup>*]*,* and the unique resulting distribution. However, the distribution of a determinant being unique does not imply that the distribution of the corresponding matrix is unique. Thus, we have to resort to other approaches for obtaining the distributions of *U* and *I* − *U*. Consider the following transformation:

$$V\_j = A\_j^{\frac{1}{2}} X\_j B\_j X\_j' A\_j^{\frac{1}{2}}, \ j = 1, \dots, k - 1, \ V\_k = \sum\_{j=1}^k A\_j^{\frac{1}{2}} X\_j B\_j X\_j' A\_j^{\frac{1}{2}} = U.$$

Then *A* 1 2 *<sup>k</sup> XkBkX*- *kA* 1 2 *<sup>k</sup>* <sup>=</sup> *<sup>U</sup>* <sup>−</sup> *<sup>V</sup>*<sup>1</sup> −···− *Vk*−<sup>1</sup> and *<sup>I</sup>* <sup>−</sup> *<sup>k</sup> <sup>j</sup>*=<sup>1</sup> *<sup>A</sup>* 1 2 *<sup>j</sup> XjBjX*- *jA* 1 2 *<sup>j</sup>* = *I* − *Vk* = *I* − *U*. Noting that

$$\mathrm{d}X\_1 \wedge \ldots \wedge \mathrm{d}X\_k = \left\{ \prod\_{j=1}^k |A\_j|^{-\frac{q\_j}{2}} |B\_j|^{-\frac{p}{2}} \frac{\pi^{\frac{q\_j p}{2}}}{\Gamma\_p(\frac{q\_j}{2})} \right\} \mathrm{d}V\_1 \wedge \ldots \wedge \mathrm{d}V\_{k-1} \wedge \mathrm{d}U,\tag{7.6.5}$$

the joint density of *V*1*,...,Vk*−1*, U,* denoted by *f*3*(V*1*,...,Vk*−1*,U)*, is seen to be

$$f\_{\mathfrak{I}}(V\_1, \dots, V\_{k-1}, U) = \frac{\Gamma\_p(\sum\_{j=1}^k (\nu\_j + \frac{q\_j}{2}) + \chi\_{k+1})}{\{\prod\_{j=1}^k \Gamma\_p(\nu\_j + \frac{q\_j}{2})\} \Gamma\_p(\chi\_{k+1})} \left\{ \prod\_{j=1}^{k-1} |V\_j|^{\mathcal{V}\_j + \frac{q\_j}{2} - \frac{p+1}{2}} \right\}$$

$$\times |U - V\_1 - \dots - V\_{k-1}|^{\mathcal{V}\_k + \frac{q\_k}{2} - \frac{p+1}{2}} |I - U|^{\mathcal{V}\_k + 1 - \frac{p+1}{2}},\quad(7.6.6)$$

where

$$|U - V\_1 - \dots - V\_{k-1}| = |U| \, |I - U^{-\frac{1}{2}} V\_1 U^{-\frac{1}{2}} - \dots - U^{-\frac{1}{2}} V\_{k-1} U^{-\frac{1}{2}}| \dots$$

Rectangular Matrix-Variate Distributions

Letting *Wj* <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> 2*VjU*−<sup>1</sup> <sup>2</sup> *, j* = 1*,...,k* − 1, for fixed *U* we have

$$\mathrm{d}V\_1 \wedge \ldots \wedge \mathrm{d}V\_{k-1} = |U|^{(k-1)(\frac{p+1}{2})} \mathrm{d}W\_1 \wedge \ldots \wedge \mathrm{d}W\_{k-1}.$$

Now, the joint density of *W*1*,...,Wk*−<sup>1</sup> and *U*, denoted by *f*4*(W*1*,...,Wk*−1*,U)*, is the following:

$$f\_4(W\_1, \dots, W\_{k-1}, U) = C\_k' \left| U \right|^{\sum\_{j=1}^k (\gamma\_j + \frac{q\_j}{2}) - \frac{p+1}{2}} |I - U|^{\eta + 1 - \frac{p+1}{2}} \left\{ \prod\_{j=1}^{k-1} |W\_j|^{\gamma\_j + \frac{q\_j}{2} - \frac{p+1}{2}} \right\} $$

$$\times |I - W\_1 - \dots - W\_{k-1}|^{\eta + \frac{q\_k}{2} - \frac{p+1}{2}} \tag{7.6.7}$$

where *C*- *<sup>k</sup>* is the normalizing constant. We then integrate out *W*1*,...,Wk*−<sup>1</sup> by using a *(k* − 1*)*-variate type-1 Dirichlet integral, this yielding the result:

$$\left\{ \prod\_{j=1}^{k} \Gamma\_p(\chi\_j + \frac{q\_j}{2}) \right\} \Big/ \Gamma\_p \Big( \sum\_{j=1}^{k} (\chi\_j + \frac{q\_j}{2}) \text{ for } \mathfrak{R}(\chi\_j + \frac{q\_j}{2}) > \frac{p-1}{2}, \ j = 1, \dots, k. \right\}$$

Accordingly, the marginal density of *U* is the following:

$$f\_{\mathbf{f}}(U) = \frac{\Gamma\_p(\sum\_{j=1}^k (\mathbf{y}\_j + \frac{q\_j}{2}) + \mathbf{y}\_{k+1})}{\Gamma\_p(\sum\_{j=1}^k (\mathbf{y}\_j + \frac{q\_j}{2}))\Gamma\_p(\mathbf{y}\_{k+1})} |U|^{\sum\_{j=1}^k (\mathbf{y}\_j + \frac{q\_j}{2}) - \frac{p+1}{2}} |I - U|^{\mathbb{N} + 1 - \frac{p+1}{2}} \tag{7.6.8}$$

for *O < U < I, (γj* <sup>+</sup> *qj* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* <sup>=</sup> <sup>1</sup>*,... , k, (γk*+1*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *,* and *f*<sup>5</sup> = 0 elsewhere. Thus, *U* is a real matrix-variate type-1 beta with the parameters *( <sup>k</sup> <sup>j</sup>*=<sup>1</sup>*(γj* <sup>+</sup> *qj* <sup>2</sup> *), γk*+1*)* and therefore that *I* −*U* is a real matrix-variate type-1 beta with the parameters *(γk*+1*, <sup>k</sup> <sup>j</sup>*=<sup>1</sup>*(γj* <sup>+</sup> *qj* <sup>2</sup> *))*. These results are now stated as a theorem.

**Theorem 7.6.1.** *Consider the density given in (7.6.1). Let <sup>U</sup>* <sup>=</sup> *<sup>k</sup> <sup>j</sup>*=<sup>1</sup> *<sup>A</sup>* 1 2 *<sup>j</sup> XjBjX*- *jA* 1 2 *j . Then, U has a real matrix-variate type-1 beta distribution whose parameters are ( <sup>k</sup> <sup>j</sup>*=<sup>1</sup>*(γj* <sup>+</sup> *qj* <sup>2</sup> *), γk*+1*) and I* − *U is distributed as a real matrix-variate type-1 beta with the parameters (γk*+1*, <sup>k</sup> <sup>j</sup>*=<sup>1</sup>*(γj* <sup>+</sup> *qj* <sup>2</sup> *)).*

The *h*-th moment of the determinant of the matrix *I* − *U* can be evaluated either from Theorem 7.6.1 or from Eq. (7.6.1). This *h*-th moment of the determinant, which can be worked out from the normalizing constant appearing in (7.6.8), is

$$E[|I - U|^h] = \frac{\Gamma\_p(\chi\_{k+1} + h)}{\Gamma\_p(\chi\_{k+1})} \frac{\Gamma\_p(\sum\_{j=1}^k (\chi\_j + \frac{q\_j}{2}) + \chi\_{k+1})}{\Gamma\_p(\sum\_{j=1}^k (\chi\_j + \frac{q\_j}{2}) + \chi\_{k+1} + h)}\tag{7.6.9}$$

for *(γj* <sup>+</sup> *qj* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (γk*+1*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . Observe that a representation of the *h*-th moment of the determinant of *U* cannot be derived from (7.6.9). However, *E*[|*U*| *<sup>h</sup>*] can be readily evaluated from Theorem 7.6.1:

$$E[|U|^h] = \frac{\Gamma\_p(\sum\_{j=1}^k (\nu\_j + \frac{q\_j}{2}) + h)}{\Gamma\_p(\sum\_{j=1}^k (\nu\_j + \frac{q\_j}{2}))} \frac{\Gamma\_p(\sum\_{j=1}^k (\nu\_j + \frac{q\_j}{2}) + \chi\_{k+1})}{\Gamma\_p(\sum\_{j=1}^k (\nu\_j + \frac{q\_j}{2}) + \chi\_{k+1} + h)} \tag{7.6.10}$$

for *(γj* <sup>+</sup> *qj* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* <sup>=</sup> <sup>1</sup>*,... , k, (γk*+1*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> . Upon expanding the *Γp(*·*)*'s in terms of *Γ (*·*)*'s, the following structural representations are obtained:

$$\left|I - U\right| = u\_1 \cdots u\_p \,, \tag{7.6.11}$$

$$\left| U \right| = v\_1 \cdots v\_p \,, \tag{7.6.12}$$

where *u*1*,...,up* are independently distributed real scalar type-1 beta random variables with the parameters *(γk*+<sup>1</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, <sup>k</sup> <sup>j</sup>*=<sup>1</sup>*(γj* <sup>+</sup> *qj* <sup>2</sup> *)), j* = 1*,... , k,* and *v*1*,...,vk* are independently distributed real scalar type-1 beta random variables with the parameters *( <sup>k</sup> <sup>j</sup>*=<sup>1</sup>*(γj* <sup>+</sup> *qj* <sup>2</sup> *)* <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, γk*+1*), j* = 1*,...,k*.

#### **7.6.2. A multivariate version of the real matrix-variate type-1 Dirichlet density**

For *p* = 1*,* consider the joint density of *Y*1*,...,Yk* in *f*2*(Y*1*,...,Yk)*, which shall be denoted by *f*6*(Y*1*,...,Yk)*. Then,

$$f\_6(Y\_1, \ldots, Y\_k) = \left\{ \prod\_{j=1}^k \frac{\Gamma(\frac{q\_j}{2})}{\pi^{\frac{q\_j}{2}}} \right\} \frac{\Gamma(\sum\_{j=1}^k (\nu\_j + \frac{q\_j}{2}) + \gamma\_{k+1})}{\left\{ \prod\_{j=1}^k \Gamma(\nu\_j + \frac{q\_j}{2}) \right\} \Gamma(\eta\_{k+1})} \left\{ \prod\_{j=1}^k |Y\_j Y\_j'|^{\mathcal{V}\_j} \right\}$$

$$\times |I - Y\_1 Y\_1' - \cdots - Y\_k Y\_k'|^{\mathcal{V}\_k + 1 - \frac{p+1}{2}},\tag{7.6.13}$$

the conditions on the parameters remaining as previously stated. Note that *Yj* is of the form *Yj* = *(yj*1*,...,yj qj )*, so that *YjY* - *<sup>j</sup>* <sup>=</sup> *<sup>y</sup>*<sup>2</sup> *<sup>j</sup>*<sup>1</sup> +···+ *<sup>y</sup>*<sup>2</sup> *j qj* . Thus, in light of its structure, the density appearing in (7.6.13) has interesting properties. For instance, it can be observed that all the subsets of *Y*1*,...,Yk* also have densities belonging to the same family. Accordingly, the marginal density of *Y*<sup>1</sup> is the following:

Rectangular Matrix-Variate Distributions

$$f\gamma(Y\_1) = \frac{\Gamma(\frac{q\_1}{2})}{\pi^{\frac{q\_1}{2}}} \frac{\Gamma(\gamma\_1 + \frac{q\_1}{2} + \gamma\_2)}{\Gamma(\gamma\_1 + \frac{q\_1}{2})\Gamma(\gamma\_2)} [\mathbf{y}\_{11}^2 + \dots + \mathbf{y}\_{1q\_1}^2]^{\gamma\_1}$$

$$\times [1 - \mathbf{y}\_{11}^2 - \dots - \mathbf{y}\_{1q\_1}^2]^{\gamma\_2 + \gamma\_1 + \frac{q\_1}{2} - \frac{p+1}{2}}\tag{7.6.14}$$

for −∞ *< y*1*<sup>r</sup> <sup>&</sup>lt;* <sup>∞</sup>*, r* <sup>=</sup> <sup>1</sup>*,...,q*1*,* <sup>0</sup> *< y*<sup>2</sup> 11+···+*y*<sup>2</sup> 1*q*1 *<sup>&</sup>lt;* <sup>1</sup>*, (γ*1<sup>+</sup> *<sup>q</sup>*<sup>1</sup> <sup>2</sup> *) >* 0*, (γ*2*) >* 0, and *f*<sup>7</sup> = 0 elsewhere. As has already been mentioned, the structure in (7.6.14) is related to geometrical probability problems involving type-1 beta distributed isotropic random points. Thus, (7.6.13) suggests the possibility of generalizing such geometrical probability problems in connection with a type-1 Dirichlet density as the underlying density for the random points. This does not appear to have yet been discussed in the literature on geometrical probability.

The complex case of the type-1 rectangular matrix-variate Dirichlet density, the real and complex cases of the rectangular matrix-variate type-2 Dirichlet density and their generalized forms can be similarly handled; hence, they will not be further discussed. Certain of these cases are brought up in this section's exercises.

**Note 7.6.1.** One could also consider a pathway version of the model appearing in Eq. (7.6.1). Let us replace the second line in (7.6.1) by

$$|I - a(1 - \alpha)(A\_1^{\frac{1}{2}}X\_1B\_1X\_1'A\_1^{\frac{1}{2}} + \dots + A\_k^{\frac{1}{2}}X\_kB\_kX\_k'A\_k^{\frac{1}{2}})|^{\frac{\eta}{1 - \alpha} - \frac{p+1}{2}}$$

where *a >* <sup>0</sup>*,α<* <sup>1</sup>*,η>* 0 are real scalar and *γk*+<sup>1</sup> by *<sup>η</sup>* <sup>1</sup>−*<sup>α</sup>* , and denote the resulting moded by *f*<sup>8</sup> whose corresponding equation number will be referred to as (7.6.15). Observe that (7.6.15) belongs to a generalized type-1 Dirichlet family of models and that the new normalizing constant will be denoted *Ck*1. For *α >* 1, write −*a(*1−*α)* = *a(α*−1*) >* 0, and then *<sup>η</sup>* <sup>1</sup>−*<sup>α</sup>* = − *<sup>η</sup> <sup>α</sup>*−<sup>1</sup> . Number the resulting model of (7.6.1) as *<sup>f</sup>*9*,* with (7.6.16) as the associated equation number. Note that (7.6.16) is actually a generalized type-2 Dirichlet model whose normalizing constant, denoted *Ck*2, will be different. Taking the limits as *α* → 1<sup>−</sup> in (7.6.15) and *α* → 1<sup>+</sup> in (7.6.16), both the models *f*<sup>8</sup> in (7.6.15) and *f*<sup>9</sup> in (7.6.16) will converge to a model *f*<sup>10</sup> whose associated equation number will be (7.6.17), wherein the second line corresponding to the second line in (7.6.1) will be

$$\mathop{\mathbf{e}}^{a-a} \eta \mathop{\mathrm{tr}} (A\_1^{\frac{1}{2}} X\_1 B\_1 X\_1' A\_1^{\frac{1}{2}} + \dots + A\_k^{\frac{1}{2}} X\_k B\_k X\_k' A\_k^{\frac{1}{2}}) \mathop{\mathbf{e}}^{\frac{1}{2}} $$

this limiting model having its own normalizing constant denoted by *Ck*3. As well, it can be established that, under the above limiting process, both *Ck*<sup>1</sup> and *Ck*<sup>2</sup> will converge to *Ck*3. Now, observe that the matrices *X*1*,...,Xk* in model *f*<sup>10</sup> are mutually independently distributed real rectangular matrix-variate gamma random variables. This turns out to be an unforeseen result as, in this case, the pathway parameter *α* is also seen to control the dependence to independence transitional stages. Results analogous to those obtained in Sects. 7.6, 7.6.1, and 7.6.2 could similarly be derived within the complex domain.

### **Exercises 7.6**

**7.6.1.** Construct, in the complex domain, the rectangular matrix-variate type-1 Dirichlet density corresponding to the density specified in (7.6.1) and determine the associated normalizing constant.

**7.6.2.** Establish, in the complex domain, a theorem corresponding to Theorem 7.6.1.

**7.6.3.** Establish, for the complex case, the structural representations corresponding to (7.6.11) and (7.6.12).

**7.6.4.** Construct a real rectangular matrix-variate type-2 Dirichlet density corresponding to the density in (7.6.1).

**7.6.5.** Construct a complex rectangular matrix-variate type-2 Dirichlet density corresponding to the density in (7.6.1).

**7.6.6.** When the *p* × *qj* matrices *Xj* 's jointly have a real type-2 Dirichlet density with the parameter matrices *Aj > O, Bj > O* as in (7.6.1) where *Aj* is *p* × *p*, *Bj* is *qj* × *qj* and *Xj* is *p* × *qj , qj* ≥ *p, j* = 1*,... , k,* of full rank *p*, establish that *U* = [*I* + *<sup>k</sup> <sup>j</sup>*=1*(A* 1 2 *<sup>j</sup> XjBjX*- *jA* 1 2 *j )*] <sup>−</sup><sup>1</sup> has a real matrix-variate type-1 beta distribution and specify its 1 1

parameters. What about the density of *<sup>k</sup> <sup>j</sup>*=1*(A* 2 *<sup>j</sup> XjBjX*- *jA* 2 *<sup>j</sup> )* in this case?

**7.6.7.** Answer the questions in Exercise 7.6.6 for the corresponding type-2 Dirichlet density in the complex domain, replacing *Xj* by *X*˜ *<sup>j</sup>* and *X*- *<sup>j</sup>* by *X*˜ <sup>∗</sup> *j* .

**7.6.8.** For the real type-2 Dirichlet density in Exercise 7.6.4, determine *E*[|*U*| *<sup>h</sup>*] for *<sup>U</sup>* as specified in Exercise 7.6.6.

**7.6.9.** Extend all the results obtained in Sect. 7.6 to the complex domain.

**7.6.10.** Derive, in the complex domain, results that are analogous to those obtained for the real case in Note 7.6.1, while keeping *a, η* and *α* real.

#### **7.7. Generalizations of the Real Rectangular Dirichlet Models**

The first author and his collaborators have considered several types of generalizations to the type-1 and type-2 Dirichlet models for real positive definite matrices and Hermitian positive definite matrices. We will propose certain extensions of those results to rectangular matrix-variate cases, both in the real and complex domains. Again, let *Xj* be a *p* × *qj , qj* ≥ *p,* matrix of full rank *p* having distinct real scalar variables as its elements, for *j* = 1*,...,k*. Let the constant real positive definite matrices *Aj > O* and *Bj > O*, where *Aj* is *p* × *p* and *Bj* is *qj* × *qj , j* = 1*,...,k*, be as defined in Sect. 7.6. Consider the real model

$$f\_{11}(X\_1, \ldots, X\_k) = D\_k |A\_1^{\frac{1}{2}} X\_1 B\_1 X\_1^{\frac{1}{2}}|^{\gamma\_1} |I - A\_1^{\frac{1}{2}} X\_1 B\_1 X\_1^{\frac{1}{2}}|^{\beta\_1}$$

$$\times |A\_2^{\frac{1}{2}} X\_2 B\_2 X\_2^{\prime} A\_2^{\frac{1}{2}}|^{\gamma\_2} |I - \sum\_{j=1}^2 A\_j^{\frac{1}{2}} X\_j B\_j X\_j^{\prime} A\_j^{\frac{1}{2}}|^{\beta\_2} \dots$$

$$\times |A\_k^{\frac{1}{2}} X\_k B\_k X\_k^{\prime} A\_k^{\frac{1}{2}}|^{\gamma\_k} |I - \sum\_{j=1}^k A\_j^{\frac{1}{2}} X\_j B\_j X\_j^{\prime} A\_j^{\frac{1}{2}}|^{\gamma\_{k+1} + \beta\_k - \frac{\rho + 1}{2}} \quad (7.7.1)$$

for *(γj* <sup>+</sup> *qj* <sup>2</sup> *) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* <sup>=</sup> <sup>1</sup>*,... , k, (αk*+1*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and other conditions to be specified later, where *Dk* is the normalizing constant. For evaluating the normalizing constant, consider the following transformations:

$$\mathbf{Z}\_{j} = \mathbf{A}\_{j}^{\frac{1}{2}} \mathbf{X}\_{j} \mathbf{B}\_{j}^{\frac{1}{2}} \Rightarrow \mathbf{d} \mathbf{X}\_{j} = |\mathbf{A}\_{j}|^{-\frac{q\_{j}}{2}} |\mathbf{B}\_{j}|^{-\frac{p}{2}} \mathbf{d} \mathbf{Z}\_{j}, \ j = 1, \ldots, k,\tag{i}$$

so that the model *f*<sup>11</sup> changes to *f*<sup>12</sup> where

$$\begin{split} f\_{12}(Z\_1, \ldots, Z\_k) &= D\_k \Big\{ \prod\_{j=1}^k |A\_j|^{-\frac{q\_j}{2}} |B\_j|^{-\frac{p}{2}} \Big\} |Z\_1 Z\_1'|^{\eta\_1} \\ &\times |I - Z\_1 Z\_1'|^{\beta\_1} |Z\_2 Z\_2|^{\eta\_2} \end{split}$$

$$\times |I - Z\_1 Z\_1' - Z\_2 Z\_2'|^{\beta\_2} \cdots |Z\_k Z\_k'|^{\eta\_k} |I - \sum\_{j=1}^k Z\_j Z\_j'|^{\eta + \beta\_k - \frac{p+1}{2}}.\tag{7.7.2}$$

Now, letting

$$\mathbf{Z}\_{j}\mathbf{Z}\_{j}^{\prime} = \mathbf{S}\_{j} \Rightarrow \mathbf{d}\mathbf{Z}\_{j} = \frac{\pi^{\frac{q\_{j}p}{2}}}{\Gamma\_{p}(\frac{q\_{j}}{2})} |\mathbf{S}\_{j}|^{\frac{q\_{j}}{2} - \frac{p+1}{2}} \mathbf{d}\mathbf{S}\_{j}, \ j = 1, \ldots, k,\tag{ii}$$

the model becomes

$$f\_{13}(S\_1, \ldots, S\_k) = \left\{ \prod\_{j=1}^k |A\_j|^{-\frac{q\_j}{2}} |B\_j|^{-\frac{p}{2}} \frac{\pi^{\frac{q\_j p}{2}}}{\Gamma\_p(\frac{q\_j}{2})} \right\} |S\_1|^{\gamma\_1 + \frac{q\_1}{2} - \frac{p+1}{2}}$$

$$\times |I - S\_1|^{\beta\_1} |S\_2|^{\gamma\_2 + \frac{q\_2}{2} - \frac{p+1}{2}} |I - S\_1 - S\_2|^{\beta\_2} \cdots$$

$$\times |S\_k|^{\gamma\_k + \frac{q\_k}{2} - \frac{p+1}{2}} |I - \sum\_{j=1}^k S\_j|^{\gamma\_k + \beta\_k - \frac{p+1}{2}}.\tag{7.7.3}$$

Now, consider the transformation (5.8.20), namely,

$$\begin{aligned} S\_1 &= Y\_1 \\ S\_2 &= (I - Y\_1)^{\frac{1}{2}} Y\_2 (I - Y\_1)^{\frac{1}{2}} \\ S\_j &= (I - Y\_1)^{\frac{1}{2}} \cdots (I - Y\_{j-1})^{\frac{1}{2}} Y\_j (I - Y\_{j-1})^{\frac{1}{2}} \cdots (I - Y\_1)^{\frac{1}{2}} \end{aligned} \tag{7.7.4}$$

for *j* = 2*,... , k.* Then *Y*1*,...,Yk* will be independently distributed real matrix-variate type-1 beta random variables with the parameters *(αj* <sup>=</sup> *γj* <sup>+</sup> *qj* <sup>2</sup> *, δj ), j* = 1*,... , k,* where

$$\delta\_j = \gamma\_{j+1} + \frac{q\_{j+1}}{2} + \dots + \gamma\_{k+1} + \frac{q\_{k+1}}{2} + \beta\_j + \dots + \beta\_k, \ j = 1, \dots, k, \ \text{and} \\ q\_{k+1} = 0. \ \text{(iii)}$$

The normalizing constant *Dk* is thus the following:

$$D\_k = \left\{ \prod\_{j=1}^k |A\_j|^{\frac{q\_j}{2}} |B\_j|^{\frac{p}{2}} \frac{\Gamma\_p(\frac{q\_j}{2})}{\pi^{\frac{q\_j p}{2}}} \right\} \frac{\Gamma\_p(\sum\_{j=1}^k (\delta\_j + \alpha\_j))}{\prod\_{j=1}^k [\Gamma\_p(\alpha\_j)\Gamma\_p(\delta\_j)]} \tag{7.7.5}$$

for *αj > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, δj <sup>&</sup>gt; <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,... , k,* where the *αj* 's and *δj* 's are as previously given. Properties parallel to those pointed out in Sects. 7.1–7.6 can also be studied for the model specified in (7.7.1). The marginal distributions of subsets of the matrices *X*1*, X*2*,...,Xk,* taken in the order, will belong to the same family of densities. There exist other generalizations of the type-1 and type-2 Dirichlet models. For all such generalizations, one can extend the results to the rectangular matrix-variate cases in both the real and complex domains.

### **Exercises 7.7**

**7.7.1.** Develop the transformation corresponding to (7.7.4) for the real type-2 Dirichlet case. Specify the Jacobians of the transformation (7.7.4) and the corresponding transformation for the type-2 case.

**7.7.2.** Verify the result for *δj* in *(iii)* following (7.7.4) and develop the expression corresponding to *δj* for the type-2 Dirichlet case.

**7.7.3.** Derive the joint marginal density of *X*1*,...,Xr, r < k,* by integrating out the matrices starting with *Xk* in (7.7.1).

**7.7.4.** Develop, in the complex domain, the model corresponding to (7.7.1) and derive its associated normalizing constant.

**7.7.5.** If possible, derive the density of *<sup>U</sup>* <sup>=</sup> *<sup>k</sup> <sup>j</sup>*=<sup>1</sup> *<sup>A</sup>* 1 2 *<sup>j</sup> XjBjX*- *jA* 1 2 *<sup>j</sup>* where the *Xj* 's, *j* = 1*,... , k,* jointly have the density given in (7.7.1).

### **References**

N. Balakrishnan and C.D. Lai (2009): *Continuous Bivariate Distributions*, Springer, New York.

C.A. Barnes, D.D. Clayton and D.N. Schramm (editors) (1982): *Essays in Nuclear Astrophysics*, Cambridge University Press, Cambridge.

C.L. Critchfield (1972): In *Cosmology, Fusion and Other Matters*, George Gamow Memorial Volume (ed: F. Reines), University of Colorado Press, Colorado, pp.186–191.

W.A. Fowler (1984): Experimental and Theoretical Nuclear Astrophysics: The Quest for the Origin of the Elements, *Review of Modern Physics*, **56**, 149–179.

T. Kondo, K. Fukushige, N. Takamune, D. Kitamura, H. Saruwatari, R. Ikeshita and T. Nakatani (2020): Convergence-guaranteed independent positive semidefinite tensor analysis based on Student's-t distribution, *arXiv:2002.08582v1 [cs.SD]* 20 Feb 2020.

A.W. Marshall and I. Olkin (1967): A multivariate exponential distribution, *Journal of the American Statistical Association*, **62**, 30–44.

A.M. Mathai (1993): *A Handbook of Generalized Special Functions for Statistical and Physical Sciences*, Oxford University Press, Oxford.

A.M. Mathai (1997): *Jacobians of Matrix Transformations and Functions of Matrix Argument*, World Scientific Publishing, New York.

A.M. Mathai (1999): *An Introduction to Geometrical Probability, Distributional Aspects with Applications*, Gordon and Breach, Amsterdam.

A.M. Mathai (2005): A pathway to matrix-variate gamma and normal densities, *Linear Algebra and its Applications*, **396**, 317–328.

A.M. Mathai and H.J. Haubold (1988): *Modern Problems in Nuclear and Neutrino Astrophysics*, Akademie-Verlag, Berlin.

A.M. Mathai and T. Princy (2017): Analogues of reliability analysis for matrix-variate cases, *Linear Algebra and its Applications*, **532** (2017), 287–311.

A.M. Mathai and S.B. Provost (2006): Some complex matrix-variate statistical distributions on rectangular matrices, *Linear Algebra and its Applications*, **410**, 198–216.

A.M. Mathai, R.K. Saxena and H.J. Haubold (2010): *The H-function: Theory and Applications*, Springer, New York.

A. Pais (1986): *Inward Bound of Matter and Forces in the Physical World*, Oxford University Press, Oxford.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 8 The Distributions of Eigenvalues and Eigenvectors**

#### **8.1. Introduction**

We will utilize the same notations as in the previous chapters. Lower-case letters *x, y,...* will denote real scalar variables, whether mathematical or random. Capital letters *X, Y, . . .* will be used to denote real matrix-variate mathematical or random variables, whether square or rectangular matrices are involved. A tilde will be placed on top of letters such as *x,*˜ *y,*˜ *X,* ˜ *Y*˜ to denote variables in the complex domain. Constant matrices will for instance be denoted by *A, B, C*. A tilde will not be used on constant matrices unless the point is to be stressed that the matrix is in the complex domain. Other notations will remain unchanged.

Our objective in this chapter is to examine the distributions of the eigenvalues and eigenvectors associated with a matrix-variate random variable. Letting *W* be such a *p* × *p* matrix-variate random variable, its determinant is the product of its eigenvalues and its trace, the sum thereof. Accordingly, the distributions of the determinant and the trace of *W* are available from the distributions of simple functions of its eigenvalues. Actually, several statistical quantities are associated with eigenvalues or eigenvectors. In order to delve into such problems, we will require certain additional properties of the matrix-variate gamma and beta distributions previously introduced in Chap. 5. As a preamble to the study of the distributions of eigenvalues and eigenvectors, these will be looked into in the next subsections for both the real and complex cases.

#### **8.1.1. Matrix-variate gamma and beta densities, real case**

Let *W*<sup>1</sup> and *W*<sup>2</sup> be statistically independently distributed *p* × *p* real matrix-variate gamma random variables whose respective parameters are *(α*1*, B)* and *(α*2*, B)* with *(αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,* 2*,* their common scale parameter matrix *B* being a real positive definite constant matrix. Then, the joint density of *W*<sup>1</sup> and *W*2, denoted by *f (W*1*, W*2*)*, is the following:

$$f(W\_1, W\_2) = \begin{cases} \frac{|B|^{\alpha\_1 + \alpha\_2}}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} |W\_1|^{\alpha\_1 - \frac{p+1}{2}} |W\_2|^{\alpha\_2 - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(B(W\_1 + W\_2))}\\\ \mathbf{B} > \mathbf{O}, \ W\_j > \mathbf{O}, \ \mathfrak{N}(\alpha\_j) > \frac{p-1}{2}, \ j = 1, 2, \ \quad . \tag{8.1.1} \\\ \text{0 elsewhere} \end{cases} \tag{8.1.1}$$

Consider the transformations

$$U\_1 = (W\_1 + W\_2)^{-\frac{1}{2}} W\_1 (W\_1 + W\_2)^{-\frac{1}{2}} \text{ and } \
U\_2 = W\_2^{-\frac{1}{2}} W\_1 W\_2^{-\frac{1}{2}},\tag{8.1.2}$$

which are matrix-variate counterparts of the changes of variables *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>w</sup>*<sup>1</sup> *w*1+*w*<sup>2</sup> and *<sup>u</sup>*<sup>2</sup> <sup>=</sup> *<sup>w</sup>*<sup>1</sup> *w*2 in the real scalar case, that is, for *p* = 1. Note that the square roots in (8.1.2) are symmetric positive definite matrices. Then, we have the following result:

**Theorem 8.1.1.** *When the real matrices U*<sup>1</sup> *and U*<sup>2</sup> *are as defined in (8.1.2), then U*<sup>1</sup> *is distributed as a real matrix-variate type-1 beta variable with the parameters (α*1*, α*2*) and U*2*, as a real matrix-variate type-2 beta variable with the parameters (α*1*, α*2*). Further, U*<sup>1</sup> *and U*<sup>3</sup> = *W*1+*W*<sup>2</sup> *are independently distributed, with U*<sup>3</sup> *having a real matrix-variate gamma distribution with the parameters (α*<sup>1</sup> + *α*2*, B).*

*Proof***:** Given the joint density of *W*<sup>1</sup> and *W*<sup>2</sup> specified in (8.1.1), consider the transformation *(W*1,*W*2*)* → *(U*<sup>3</sup> = *W*<sup>1</sup> + *W*2*, U* = *W*1*)*. On observing that its Jacobian is equal to one, the joint density of *U*<sup>3</sup> and *U*, denoted by *f*1*(U*3*, U)*, is obtained as

$$f\_1(U\_3, U) \,\mathrm{d}U\_3 \wedge \mathrm{d}U = c \, |U|^{a\_1 - \frac{p+1}{2}} |U\_3 - U|^{a\_2 - \frac{p+1}{2}} \mathrm{e}^{-\mathrm{tr}(U\_3)} \mathrm{d}U\_3 \wedge \mathrm{d}U \tag{i}$$

where

$$c = \frac{|\mathcal{B}|^{\alpha\_1 + \alpha\_2}}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)}.\tag{ii}$$

Noting that

$$|U\_3 - U| = |U\_3| \, |I - U\_3^{-\frac{1}{2}} U U\_3^{-\frac{1}{2}}|,$$

we now let *<sup>U</sup>*<sup>1</sup> <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> 2 <sup>3</sup> *UU*−<sup>1</sup> 2 <sup>3</sup> for fixed *U*3, so that d*U*<sup>1</sup> = |*U*3| <sup>−</sup>*p*+<sup>1</sup> <sup>2</sup> d*U*. Accordingly, the joint density of *U*<sup>3</sup> and *U*1, denoted by *f*2*(U*3*, U*1*)*, is the following, observing that *U*<sup>1</sup> is as defined in (8.1.2) with *W*<sup>1</sup> = *U* and *U*<sup>3</sup> = *W*<sup>1</sup> + *W*2:

$$f\_2(U\_3, \ U\_1) = c \left| U\_3 \right|^{a\_1 + a\_2 - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(U\_3)} |U\_1|^{a\_1 - \frac{p+1}{2}} |I - U\_1|^{a\_2 - \frac{p+1}{2}} \tag{iiii}$$

#### The Distributions of Eigenvalues and Eigenvectors

for *(α*1*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (α*2*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, U*<sup>3</sup> *> O, O < U*<sup>1</sup> *< I* , and zero elsewhere. On multiplying and dividing *(iii)* by *Γp(α*<sup>1</sup> + *α*2*)*, it is seen that *U*<sup>1</sup> and *U*<sup>3</sup> = *W*<sup>1</sup> + *W*<sup>2</sup> are independently distributed as their joint density factorizes into the product of two densities *g(U*1*)* and *g*1*(U*3*)*, that is, *f*2*(U*1*, U*3*)* = *g(U*1*)g*1*(U*3*)* where

$$\log(U\_{\rm I}) = \frac{\Gamma\_p(\alpha\_1 + \alpha\_2)}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} |U\_1|^{\alpha\_1 - \frac{p+1}{2}} |I - U\_1|^{\alpha\_2 - \frac{p+1}{2}}, \; O < U\_1 < I,\tag{8.1.3}$$

for *(α*1*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, (α*2*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *,* and zero elsewhere, is a real matrix-variate type-1 beta density with the parameters *(α*1*, α*2*)*, and

$$\log\_1(U\_\mathfrak{J}) = \frac{|B|^{\alpha\_1 + \alpha\_2}}{\Gamma\_p(\alpha\_1 + \alpha\_2)} |U\_\mathfrak{J}|^{\alpha\_1 + \alpha\_2 - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(B \ U\_\mathfrak{J})}, \ U\_\mathfrak{J} > O,\tag{8.1.4}$$

for *B > O, (α*<sup>1</sup> <sup>+</sup> *<sup>α</sup>*2*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , and zero elsewhere, which is a real matrix-variate gamma density with the parameters *(α*<sup>1</sup> + *α*2*, B)*. Thus, given two independently distributed *p* × *p* real positive definite matrices *W*<sup>1</sup> and *W*2, where *W*<sup>1</sup> ∼ gamma *(α*1*, B), B > O, (α*1*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *,* and *<sup>W</sup>*<sup>2</sup> <sup>∼</sup> gamma *(α*2*, B), B > O, (α*2*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , one has *U*<sup>3</sup> = *W*<sup>1</sup> + *W*<sup>2</sup> ∼ gamma *(α*<sup>1</sup> + *α*2*, B), B > O*.

In order to determine the distribution of *U*2, we first note that the exponent in (8.1.1) is tr*(B(W*<sup>1</sup> + *W*2*))* = tr[*B* 1 <sup>2</sup>*W*1*B* 1 <sup>2</sup> + *B* 1 <sup>2</sup>*W*2*B* 1 <sup>2</sup> ]. Letting *Vj* = *B* 1 <sup>2</sup>*WjB* 1 <sup>2</sup> *,* d*Vj* = |*B*| *p*+1 <sup>2</sup> d*Wj* , *j* = 1*,* 2*,* which eliminates *B*, the resulting joint density of *V*<sup>1</sup> and *V*2, denoted by *f*3*(V*1*, V*2*)*, being

$$f\_3(V\_1, V\_2) = \frac{1}{\Gamma\_P(\alpha\_1)\Gamma\_P(\alpha\_2)} |V\_1|^{\alpha\_1 - \frac{p+1}{2}} |V\_2|^{\alpha\_2 - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(V\_1 + V\_2)}, \ V\_j > O,\tag{8.1.5}$$

for *(αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,* 2, and zero elsewhere. Now, noting that tr[*V*1+*V*2] = tr[*V* 1 2 <sup>2</sup> *(I* + *<sup>V</sup>* <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *<sup>V</sup>*1*<sup>V</sup>* <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *)V* 1 2 <sup>2</sup> ] and letting *<sup>V</sup>* <sup>=</sup> *<sup>V</sup>* <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *<sup>V</sup>*1*<sup>V</sup>* <sup>−</sup><sup>1</sup> 2 <sup>2</sup> = *U*<sup>2</sup> of (8.1.2) so that d*V* = |*V*2| <sup>−</sup>*p*+<sup>1</sup> <sup>2</sup> d*V*<sup>1</sup> for fixed *V*2, the joint density of *V* and *V*<sup>2</sup> = *V*3, denoted by *f*4*(V , V*3*)*, is obtained as

$$f\_4(V, V\_3) = \frac{1}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} |V|^{\alpha\_1 - \frac{p+1}{2}} |V\_3|^{\alpha\_1 + \alpha\_2 - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}((I+V)^{\frac{1}{2}}V\_3(I+V)^{\frac{1}{2}})} \tag{8.1.6}$$

where tr[*V* 1 2 <sup>3</sup> *(I* + *V )V* 1 2 <sup>3</sup> ] was replaced by tr[*(I* + *V )* 1 <sup>2</sup>*V*3*(I* + *V )* 1 <sup>2</sup> ]. It then suffices to integrate out *V*<sup>3</sup> from the joint density specified in (8.1.6) by making use of a real matrixvariate gamma integral, to obtain the density of *V* = *U*<sup>2</sup> that follows:

$$\log\_2(V) = \frac{\Gamma\_p(\alpha\_1 + \alpha\_2)}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} |V|^{\alpha\_1 - \frac{p+1}{2}} |I + V|^{-(\alpha\_1 + \alpha\_2)}, \; V > O,\tag{8.1.7}$$

for *(αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,* 2, and zero elsewhere, which is a real matrix-variate type-2 beta density whose parameters are *(α*1*, α*2*)*. This completes the proof.

#### **8.1a. Matrix-variate Gamma and Beta Densities, Complex Case**

Parallel results can be obtained in the complex domain. If *W*˜ <sup>1</sup> and *W*˜ <sup>2</sup> are statistically independently distributed *p*×*p* Hermitian positive definite matrices having complex matrix-variate gamma densities with the parameters *(α*1*, B)*˜ and *(α*2*, B),* ˜ *B*˜ = *B*˜ <sup>∗</sup> *> O*, where an asterisk designates a conjugate transpose, then their joint density, denoted by *f (*˜ *W*˜ <sup>1</sup>*, W*˜ <sup>2</sup>*)*, is given by

$$\tilde{f}(\tilde{W}\_1, \tilde{W}\_2) = \frac{|\det(B)|^{\alpha\_1 + \alpha\_2}}{\tilde{\Gamma}\_p(\alpha\_1)\tilde{\Gamma}\_p(\alpha\_2)} |\det(\tilde{W}\_1)|^{\alpha\_1 - p} |\det(\tilde{W}\_2)|^{\alpha\_2 - p} \mathbf{e}^{-\text{tr}(\tilde{B}(\tilde{W}\_1 + \tilde{W}\_2))} \qquad (8.1a.1)$$

for *B > O,* ˜ *W*˜ <sup>1</sup> *> O, W*˜ <sup>2</sup> *> O, (αj )>p* − 1*, j* = 1*,* 2, and zero elsewhere, with |det*(W*˜ *<sup>j</sup> )*| denoting the absolute value or modulus of the determinant of *W*˜ *<sup>j</sup>* . Since the derivations are similar to those provided in the previous subsection for the real case, the next results will be stated without proof. Note that, in the complex domain, the square roots involved in the transformations are Hermitian positive definite matrices.

**Theorem 8.1a.1.** *Let the p* × *p Hermitian positive definite matrices W*˜ <sup>1</sup> *and W*˜ <sup>2</sup> *be independently distributed as complex matrix-variate gamma variables with the parameters (α*1*, B)*˜ *and (α*2*, B),* ˜ *B*˜ = *B*˜ <sup>∗</sup> *> O, respectively. Letting U*˜<sup>3</sup> = *W*˜ <sup>1</sup> + *W*˜ <sup>2</sup>*,*

$$
\tilde{U}\_1 = (\tilde{W}\_1 + \tilde{W}\_2)^{-\frac{1}{2}} \tilde{W}\_1 (\tilde{W}\_1 + \tilde{W}\_2)^{-\frac{1}{2}} = \tilde{U}\_3^{-\frac{1}{2}} \tilde{W}\_1 \tilde{U}\_3^{-\frac{1}{2}} \text{ and } \tilde{U}\_2 = \tilde{W}\_2^{-\frac{1}{2}} \tilde{W}\_1 \tilde{W}\_2^{-\frac{1}{2}}.
$$

*then (1): U*˜<sup>3</sup> *is distributed as a complex matrix-variate gamma with the parameters (α*<sup>1</sup> + *α*1*, B),* ˜ *B*˜ = *B*˜ <sup>∗</sup> *> O, (α*<sup>1</sup> + *α*2*)>p* − 1*; (2): U*˜<sup>1</sup> *and U*˜<sup>3</sup> *are independently distributed; (3): U*˜<sup>1</sup> *is distributed as a complex matrix-variate type-1 beta random variable with the parameters (α*1*, α*2*); (4): U*˜<sup>2</sup> *is distributed as a complex matrix-variate type-2 beta random variable with the parameters (α*1*, α*2*).*

### **8.1.2. Real Wishart matrices**

Since Wishart matrices are distributed as matrix-variate gamma variables whose parameters are *αj* <sup>=</sup> *mj* <sup>2</sup> *, mj* <sup>≥</sup> *p,* and *<sup>B</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup>*Σ*−1*,Σ>O*, we have the following corollaries in the real and complex cases:

**Corollary 8.1.1.** *Let the p* × *p real positive definite matrices W*<sup>1</sup> *and W*<sup>2</sup> *be independently Wishart distributed, Wj* ∼ *Wp(mj , Σ), with mj* ≥ *p, j* = 1*,* 2*, degrees of freedom, common parameter matrix Σ>O, and respective densities given by*

The Distributions of Eigenvalues and Eigenvectors

$$\phi\_j(W\_j) = \frac{1}{2^{\frac{pm\_j}{2}} \Gamma\_p(\frac{m\_j}{2}) |\boldsymbol{\Sigma}|^{\frac{m\_j}{2}}} |W\_j|^{\frac{m\_j}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\boldsymbol{\Sigma}^{-1} W\_j)}, \ j = 1, 2,\tag{8.1.8}$$

*and zero elsewhere. Then (1): U*<sup>3</sup> = *W*<sup>1</sup> + *W*<sup>2</sup> *is Wishart distributed with m*<sup>1</sup> + *m*<sup>2</sup> *degrees of freedom and parameter matrix Σ>O, that is, U*<sup>3</sup> ∼ *Wp(m*1+*m*2*, Σ), Σ > O; (2): U*<sup>1</sup> = *(W*<sup>1</sup> + *W*2*)* −1 <sup>2</sup>*W*1*(W*<sup>1</sup> + *W*2*)* −1 <sup>2</sup> <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> 2 <sup>3</sup> *<sup>W</sup>*1*U*−<sup>1</sup> 2 <sup>3</sup> *is a real matrix-variate type-1 beta random variable with the parameters (α*1*, α*2*), that is, <sup>U</sup>*<sup>1</sup> <sup>∼</sup> *type-1 beta (m*<sup>1</sup> <sup>2</sup> *, <sup>m</sup>*<sup>2</sup> <sup>2</sup> *); (3): <sup>U</sup>*<sup>1</sup> *and <sup>U</sup>*<sup>3</sup> *are independently distributed; (4): <sup>U</sup>*<sup>2</sup> <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 <sup>2</sup> *is a real matrix-variate type-2 beta random variable with the parameters (α*1*, α*2*), that is, V* ∼ *type-2 beta (m*<sup>1</sup> <sup>2</sup> *, <sup>m</sup>*<sup>2</sup> <sup>2</sup> *).*

The corresponding results for the complex case are parallel with identical numbers of degrees of freedom, *m*<sup>1</sup> and *m*2, and parameter matrix *Σ*˜ = *Σ*˜ <sup>∗</sup> *> O* (Hermitian positive definite). Properties associated with type-1 and type-2 beta variables hold as well in the complex domain. Consider for instance the following results which are also valid in the complex case. If *U* is a type-1 beta variable with the parameters *(α*1*, α*2*)*, then *I* − *U* is a type-1 beta variable with the parameters *(α*2*, α*1*)* and *(I* <sup>−</sup> *U )*−<sup>1</sup> <sup>2</sup>*U (I* <sup>−</sup> *U )*−<sup>1</sup> <sup>2</sup> is a type-2 beta variable with the parameters *(α*1*, α*2*)*.

#### **8.2. Some Eigenvalues and Eigenvectors, Real Case**

Observe that when *X*, a *p* × *p* real positive definite matrix, has a real matrix-variate gamma density with the parameters *(α, B), B > O, (α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> , then *Z* = *B* 1 <sup>2</sup>*XB* 1 2 has a real matrix-variate gamma density with the parameters *(α, I )* where *I* is the identity matrix. The corresponding result for a Wishart matrix is the following: Let *W* be a real Wishart matrix having *m* degrees of freedom and *Σ>O* as its parameter matrix, that is, *<sup>W</sup>* <sup>∼</sup> *Wp(m, Σ), Σ > O, m* <sup>≥</sup> *<sup>p</sup>*, then *<sup>Z</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> 2*WΣ*−<sup>1</sup> <sup>2</sup> ∼ *Wp(m, I ), m* ≥ *p*, that is, *Z* is a Wishart matrix having *m* degrees of freedom and *I* as its parameter matrix. If we are considering the roots of the determinantal equation |*W*ˆ <sup>1</sup> − *λW*ˆ <sup>2</sup>| = 0 where *W*ˆ <sup>1</sup> ∼ *Wp(m*1*, Σ)* and *W*ˆ <sup>2</sup> ∼ *Wp(m*2*, Σ), Σ > O, mj* ≥ *p, j* = 1*,* 2*,* and if *W*ˆ <sup>1</sup> and *<sup>W</sup>*<sup>ˆ</sup> <sup>2</sup> are independently distributed, so will *<sup>W</sup>*<sup>1</sup> <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>*W*<sup>ˆ</sup> <sup>1</sup>*Σ*−<sup>1</sup> <sup>2</sup> and *<sup>W</sup>*<sup>2</sup> <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>*W*<sup>ˆ</sup> <sup>2</sup>*Σ*−<sup>1</sup> <sup>2</sup> be. Then

$$\begin{split} |\Sigma^{-\frac{1}{2}}\hat{W}\_{1}\Sigma^{-\frac{1}{2}} - \lambda\Sigma^{-\frac{1}{2}}\hat{W}\_{2}\Sigma^{-\frac{1}{2}}| &= 0 \Rightarrow |\Sigma|^{-\frac{1}{2}}|\hat{W}\_{1} - \lambda\hat{W}\_{2}|\,|\Sigma|^{-\frac{1}{2}} = 0\\ &\Rightarrow |\hat{W}\_{1} - \lambda\hat{W}\_{2}| = 0. \end{split} \tag{8.2.1}$$

Thus, the roots of |*W*<sup>1</sup> − *λW*2| = 0 and |*W*ˆ <sup>1</sup> − *λW*ˆ <sup>2</sup>| = 0 are identical. Hence, without any loss of generalily, one needs only consider the roots of *Wj , Wj* ∼ *Wp(mj , I ), mj* ≥ *p, j* = 1*,* 2*,* when independently distributed Wishart matrices sharing a common matrix parameter are involved. Observe that

$$|W\_1 - \lambda W\_2| = 0 \Rightarrow |W\_2^{-\frac{1}{2}} W\_1 W\_2^{-\frac{1}{2}} - \lambda I| = 0\tag{8.2.2}$$

which means that *<sup>λ</sup>* is an eigenvalue of *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 <sup>2</sup> when *Wj ind* ∼ *Wp(mj , I ), j* = 1*,* 2. If *Yj* is an eigenvector corresponding to the eigenvalue *λj* , it must satisfy the equation

$$(W\_2^{-\frac{1}{2}}W\_1W\_2^{-\frac{1}{2}})Y\_j=\lambda\_jY\_j.\tag{8.2.3}$$

Let the eigenvalues *λj* 's be distinct so that *λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp*. Actually, it can be shown that *P r*{*λi* = *λj* } = 0 almost surely for all *i* <sup>=</sup> *<sup>j</sup>* . When the eigenvalues of *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 2 are distinct, then the eigenvectors are orthogonal since *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 <sup>2</sup> is symmetric. Thus, in this case, there exists a set of *p* linearly independent mutually orthogonal eigenvectors. Let *Y*1*,...,Yp* be a set of normalized mutually orthonormal eigenvectors and let *Y* = *(Y*1*,...,Yp)* be the *p* × *p* matrix consisting of the normalized eigenvectors. Our aim is to determine the joint density of *Y* and *λ*1*,...,λp*, and thereby the marginal densities of *Y* and *λ*1*,...,λp*. To this end, we will need the Jacobian provided in the next theorem. For its derivation and connection to other Jacobians, the reader is referred to Mathai (1997).

**Theorem 8.2.1.** *Let Z be a p* × *p real symmetric matrix comprised of distinct real scalar variables as its elements, except for symmetry, and let its distinct nonzero eigenvalues be λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp, which are real owing to its symmetry. Let D* = diag*(λ*1*,...,λp),* d*D* = d*λ*1∧*...*∧d*λp, and P be a unique orthonormal matrix such that P P*- = *I,P*- *P* = *I, and Z* = *PDP*- *. Then, after integrating out the differential element of P over the full orthogonal group Op, we have*

$$\mathrm{d}Z = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \left\{ \prod\_{i=1}^{p-1} \prod\_{j=i+1}^p (\lambda\_i - \lambda\_j) \right\} \mathrm{d}D = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \left\{ \prod\_{i$$

**Corollary 8.2.1.** *Let g(Z) be a symmetric function of the p* × *p real symmetric matrix Z—symmetric function in the sense that g(AB)* = *g(BA) whenever AB and BA are defined, even if AB* = *BA. Let the eigenvalues of Z be distinct such that λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp, D* = diag*(λ*1*,...,λp) and* d*D* = d*λ*<sup>1</sup> ∧ *...* ∧ d*λp. Then,*

$$\int\_{Z} \mathbf{g}(Z)dZ = \int\_{D} \mathbf{g}(D) \frac{\pi^{\frac{p^{2}}{2}}}{\Gamma\_{p}(\frac{p}{2})} \left\{ \prod\_{i$$

**Example 8.2.1.** Consider a *p* × *p* real matrix *X* having a matrix-variate gamma density with shape parameter *<sup>α</sup>* <sup>=</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> and scale parameter matrix *I* , whose density is

$$f(X) = \frac{1}{\Gamma\_p(\frac{p+1}{2})} \mathbf{e}^{-\text{tr}(X)}, \ X > O.$$

A variable having this density is also said to follow *the real p* × *p matrix-variate exponential distribution*. Let *p* = 2 and denote the eigenvalues of *X* by ∞ *> λ*<sup>1</sup> *> λ*<sup>2</sup> *>* 0. It follows from Corollary 8.2.1 that the joint density of *λ*<sup>1</sup> and *λ*2, denoted by *f*1*(D),* with *D* = diag*(λ*1*, λ*2*)*, is given by

$$f\_1(D)\,\mathrm{d}D = \frac{1}{\Gamma\_2(\frac{2+1}{2})} \frac{\pi^{\frac{2}{2}}}{\Gamma\_2(\frac{2}{2})} (\lambda\_1 - \lambda\_2) \,\mathrm{e}^{-\mathrm{tr}(D)}\mathrm{d}D.$$

Verify that *f*1*(D)* is a density.

**Solution 8.2.1.** Since *f*1*(D)* is nonnegative, it suffices to show that the total integral is equal to 1. Excluding the constant part, the integral to be evaluated is the following:

$$\begin{split} \int\_{\lambda\_{1}=0}^{\infty} & \int\_{\lambda\_{2}=0}^{\lambda\_{1}} (\lambda\_{1}-\lambda\_{2}) \mathbf{e}^{-(\lambda\_{1}+\lambda\_{2})} \mathrm{d}\lambda\_{1} \wedge \mathrm{d}\lambda\_{2} \\ &= \int\_{\lambda\_{1}=0}^{\infty} \lambda\_{1} \mathbf{e}^{-\lambda\_{1}} \left[ \int\_{\lambda\_{2}=0}^{\lambda\_{1}} \mathbf{e}^{-\lambda\_{2}} \mathrm{d}\lambda\_{2} \right] \mathrm{d}\lambda\_{1} - \int\_{\lambda\_{1}=0}^{\infty} \mathbf{e}^{-\lambda\_{1}} \left[ \int\_{\lambda\_{2}=0}^{\lambda\_{1}} \lambda\_{2} \mathbf{e}^{-\lambda\_{2}} \mathrm{d}\lambda\_{2} \right] \mathrm{d}\lambda\_{1} \\ &= \int\_{\lambda\_{1}=0}^{\infty} \lambda\_{1} \mathbf{e}^{-\lambda\_{1}} [1 - \mathbf{e}^{-\lambda\_{1}}] \mathrm{d}\lambda\_{1} \\ &+ \int\_{0}^{\infty} \lambda\_{1} \mathbf{e}^{-2\lambda\_{1}} \mathrm{d}\lambda\_{1} - \int\_{0}^{\infty} \mathbf{e}^{-\lambda\_{1}} \mathrm{d}\lambda\_{1} + \int\_{0}^{\infty} \mathbf{e}^{-2\lambda\_{1}} \mathrm{d}\lambda\_{1} \\ &= \int\_{0}^{\infty} \lambda\_{1} \mathbf{e}^{-\lambda\_{1}} \mathrm{d}\lambda\_{1} - \int\_{0}^{\infty} \mathbf{e}^{-\lambda\_{1}} \mathrm{d}\lambda\_{1} + \int\_{0}^{\infty} \mathbf{e}^{-2\lambda\_{1}} \mathrm{d}\lambda\_{1} \\ &= 1 - 1 + \frac{1}{2} = \frac{1}{2}. \end{split}$$

Let us now compute the constant part:

$$\begin{split} \frac{1}{\Gamma\_{2}(\frac{2+1}{2})} \frac{\pi^{\frac{2}{2}}}{\Gamma\_{2}(\frac{2}{2})} &= \frac{1}{\Gamma\_{2}(\frac{3}{2})} \frac{\pi^{2}}{\Gamma\_{2}(1)} \\ &= \frac{1}{\sqrt{\pi}(\frac{1}{2}\sqrt{\pi})} \frac{\pi^{2}}{\sqrt{\pi}\sqrt{\pi}} = 2. \end{split} \tag{ii}$$

Since the product of *(i)* and *(ii)* gives 1, *f*1*(D)* is indeed a density.

**Example 8.2.2.** Consider a *p*×*p* matrix having a real matrix-variate type-1 beta density with the parameters *<sup>α</sup>* <sup>=</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> *, β* <sup>=</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> , whose density, denoted by *f (X)*, is

$$f(X) = \frac{\Gamma\_p(p+1)}{\Gamma\_p(\frac{p+1}{2})\Gamma\_p(\frac{p+1}{2})}, \text{ } O < X < I,$$

and zero elsewhere. This density *f (X)* is also referred to as a *real p* × *p matrix-variate uniform density*. Let *p* = 2 and the eigenvalues of *X* be 1 *> λ*<sup>1</sup> *> λ*<sup>2</sup> *>* 0. Then, the density of *D* = diag*(λ*1*, λ*2*)*, denoted by *f*1*(D)*, is

$$f\_1(D)\mathrm{d}D = \frac{\Gamma\_2(2+1)}{[\Gamma\_2(\frac{2+1}{2})]^2} \frac{\pi^{\frac{2}{2}}}{\Gamma\_2(\frac{2}{2})} (\lambda\_1 - \lambda\_2)\,\mathrm{d}D,\ \ 1 > \lambda\_1 > \lambda\_2 > 0,$$

and zero elsewhere. Verify that *f*1*(D)* is a density.

**Solution 8.2.2.** The constant part simplifies to the one:

$$\frac{\Gamma\_2(3)}{[\Gamma\_2(\frac{3}{2})]^2} \frac{\pi^2}{\Gamma\_2(1)} = \frac{\sqrt{\pi} \Gamma(3) \Gamma(\frac{5}{2})}{[\sqrt{\pi} \Gamma(\frac{3}{2}) \Gamma(\frac{2}{2})]^2} \frac{\pi^2}{\sqrt{\pi} \Gamma(1) \Gamma(\frac{1}{2})}$$

$$= \frac{\sqrt{\pi} (2)(\frac{3}{2})(\frac{1}{2})\sqrt{\pi}}{[\sqrt{\pi}(\frac{1}{2})\sqrt{\pi}]^2} \frac{\pi^2}{\sqrt{\pi}\sqrt{\pi}} = 6. \tag{i}$$

Let us now consider the functional part of the integrand:

$$\begin{split} \int\_{\lambda\_1=0}^1 \int\_{\lambda\_2=0}^{\lambda\_1} (\lambda\_1 - \lambda\_2) \, \mathrm{d}\lambda\_1 \wedge \mathrm{d}\lambda\_2 &= \int\_0^1 \lambda\_1^2 \, \mathrm{d}\lambda\_1 - \int\_0^1 \frac{\lambda\_1^2}{2} \, \mathrm{d}\lambda\_1 \\ &= \frac{1}{2} \int\_0^1 \lambda\_1^2 \, \mathrm{d}\lambda\_1 = \frac{1}{6}. \end{split} \tag{ii}$$

As the product of *(i)* and *(ii)* equals 1, it is verified that *f*1*(D)* is a density.

**Example 8.2.3.** Consider a *p* × *p* matrix having a matrix-variate gamma density with shape parameter *α* and scale parameter matrix *I* . Let *D* = diag*(λ*1*,...,λp)* where ∞ *> λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp >* 0 are the eigenvalues of that matrix. The joint density of the *λj* 's or the density of *D* is then available as

$$\operatorname{d}f\_{\operatorname{l}}(D)\operatorname{d}D = \frac{\pi^{\frac{p^{2}}{2}}}{\Gamma\_{p}(\frac{p}{2})\Gamma\_{p}(\alpha)}[\lambda\_{1}\cdots\lambda\_{p}]^{\alpha-\frac{p+1}{2}}\operatorname{e}^{-(\lambda\_{1}+\cdots+\lambda\_{p})}\Big[\prod\_{i$$

Even when *α* is specified, the integral representation of the density *f*1*(D)* will generally only be expressible in terms of incomplete gamma functions or confluent hypergeometric series, with simple functional forms being obtainable only for certain values of *α* and *p*. Verify that *<sup>f</sup>*1*(D)* is a density for *<sup>α</sup>* <sup>=</sup> <sup>7</sup> <sup>2</sup> and *p* = 2.

**Solution 8.2.3.** For those values of *p* and *α*, we have

$$[\lambda\_1 \lambda\_2]^{\frac{7}{2} - \frac{3}{2}} (\lambda\_1 - \lambda\_2) \mathbf{e}^{-(\lambda\_1 + \lambda\_2)} = [\lambda\_1^3 \lambda\_2^2 - \lambda\_1^2 \lambda\_2^3] \mathbf{e}^{-(\lambda\_1 + \lambda\_2)}$$

whose integral over ∞ *> λ*<sup>1</sup> *> λ*<sup>2</sup> *>* 0 is the sum of *(i)* and *(ii)*:

$$\begin{split} \int\_{\lambda\_1=0}^{\infty} \int\_{\lambda\_2=0}^{\lambda\_1} \lambda\_1^3 \lambda\_2^2 \mathbf{e}^{-(\lambda\_1 + \lambda\_2)} \mathbf{d} \lambda\_2 \wedge \mathbf{d} \lambda\_1 &= \int\_{\lambda\_1=0}^{\infty} \lambda\_1^3 \mathbf{e}^{-\lambda\_1} [\int\_{\lambda\_2=0}^{\lambda\_1} \lambda\_2^2 \mathbf{e}^{-\lambda\_2} \mathbf{d} \lambda\_2] \mathbf{d} \lambda\_1 \\ &= \int\_{\lambda\_1=0}^{\infty} [(-\lambda\_1^5 - 2\lambda\_1^4 - 2\lambda\_1^3)\mathbf{e}^{-2\lambda\_1} + 2\lambda\_1^3 \mathbf{e}^{-\lambda\_1}] \mathbf{d} \lambda\_1, \end{split} \tag{i}$$

$$\begin{aligned} -\int\_{\lambda\_1=0}^{\infty} \lambda\_1^2 \mathbf{e}^{-\lambda\_1} [\int\_{\lambda\_2=0}^{\lambda\_1} \lambda\_2^3 \mathbf{e}^{-\lambda\_2} \mathbf{d}\lambda\_2] \mathrm{d}\lambda\_1 \\ &= \int\_0^{\infty} [(\lambda\_1^5 + 3\lambda\_1^4 + 6\lambda\_1^3 + 6\lambda\_1^2) \mathbf{e}^{-2\lambda\_1} - 6\lambda\_1^2 \mathbf{e}^{-\lambda\_1}] \mathrm{d}\lambda\_1, \\ \text{that is,} \end{aligned} \tag{ii}$$

$$\begin{split} &\int\_{0}^{\infty} [(\lambda\_1^4 + 4\lambda\_1^3 + 6\lambda\_1^2)\mathbf{e}^{-2\lambda\_1} + (2\lambda\_1^3 - 6\lambda\_1^2)\mathbf{e}^{-\lambda\_1}] d\lambda\_1 \\ &= 2^{-5}\Gamma(5) + 4(2^{-4})\Gamma(4) + 6(2^{-3})\Gamma(3) + 2\Gamma(4) - 6\Gamma(3) \\ &= \frac{4!}{2^5} + \frac{4(3!)}{2^4} + \frac{6(2!)}{2^3} + 2(3!) - 6(2!) = \frac{15}{4} . \end{split} \tag{iii}$$

Now, consider the constant part:

$$\begin{split} \frac{1}{\Gamma\_{P}(\alpha)} \frac{\pi^{\frac{p^{2}}{2}}}{\Gamma\_{P}(\frac{p}{2})} &= \frac{1}{\Gamma\_{2}(\frac{7}{2})} \frac{\pi^{2}}{\Gamma\_{2}(\frac{2}{2})} \\ &= \frac{1}{\sqrt{\pi}(\frac{5}{2})(\frac{3}{2})\frac{1}{2}\sqrt{\pi}(2!)} \frac{\pi^{2}}{\sqrt{\pi}\sqrt{\pi}} = \frac{4}{15}. \end{split} \tag{i\nu}$$

The product of *(iii)* and *(iv)* giving 1, this verifies that *f*1*(D)* is a density when *p* = 2 and *<sup>α</sup>* <sup>=</sup> <sup>7</sup> 2 .

#### **8.2a. The Distributions of Eigenvalues in the Complex Case**

The complex counterpart of Theorem 8.2.1 is stated next.

**Theorem 8.2a.1.** *Let Z*˜ *be a p*×*p Hermitian matrix with distinct real nonzero eigenvalues λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp. Let Q*˜ *be a p* × *p unique unitary matrix, Q*˜ *Q*˜ <sup>∗</sup> = *I, Q*˜ <sup>∗</sup>*Q*˜ = *I such that Z*˜ = *QD*˜ *Q*˜ <sup>∗</sup> *where an asterisk designates the conjugate transpose. Then, after integrating out the differential element of Q*˜ *over the full orthogonal group O*˜*p, we have*

$$\mathrm{d}\tilde{Z} = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} \Big\{ \prod\_{i$$

**Note 8.2a.1.** When the unitary matrix *Q*˜ has diagonal elements that are real, then the integral of the differential element over the full orthogonal group *O*˜*<sup>p</sup>* will be the following:

$$\int\_{\tilde{O}\_p} \tilde{h}(\tilde{Q}) = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} \tag{8.2a.2}$$

where *h(*˜ *Q)*˜ = ∧[*(*d*Q)*˜ *Q*˜ ∗]; the reader may refer to Theorem 4.4 and Corollary 4.3.1 of Mathai (1997) for details. If all the elements comprising *Q*˜ are complex, then the numerator in (8.2a.2) will be *πp*<sup>2</sup> instead of *πp(p*−1*)* . When unitary transformations are made on Hermitian matrices such as *Z*˜ in Theorem 8.2a.1, the diagonal elements in the unitary matrix *Q*˜ are real and hence the numerator in (8.2a.2) remains *πp(p*−1*)* in this case.

**Note 8.2a.2.** A corollary parallel to Corollary 8.2.1 also holds in the complex domain.

**Example 8.2a.1.** Consider a complex *p* × *p* matrix *X*˜ having a matrix-variate type-1 beta density with the parameters *α* = *p* and *β* = *p*, so that its density, denoted by *f (*˜ *X)*˜ , is the following:

$$
\tilde{f}(\tilde{X}) = \frac{\tilde{\Gamma}\_p(2p)}{\tilde{\Gamma}\_p(p)\tilde{\Gamma}\_p(p)}, \ \ O < \tilde{X} < I,
$$

which is also referred to as *the p* × *p complex matrix-variate uniform density*. Let *D* = diag*(λ*1*,...,λp)* where 1 *> λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp >* 0 are the eigenvalues of *X*˜ . Then, the density of *D*, denoted by *f*1*(D)*, is given by

$$f\_1(D)\,\mathrm{d}D = \frac{\tilde{\Gamma}\_p(2p)}{[\tilde{\Gamma}\_p(p)]^2} \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} \Big[\prod\_{i$$

Verify that *(i)* is a density for *p* = 2.

**Solution 8.2a.1.** For Hermitian matrices the eigenvalues are real. Consider the integral over *(λ*<sup>1</sup> <sup>−</sup> *<sup>λ</sup>*2*)*2:

$$\begin{split} &\int\_{\lambda\_{1}=0}^{1} \left[ \int\_{\lambda\_{2}=0}^{\lambda\_{1}} (\lambda\_{1}^{2} + \lambda\_{2}^{2} - 2\lambda\_{1}\lambda\_{2}) \mathrm{d}\lambda\_{2} \right] \mathrm{d}\lambda\_{1} \\ &= \int\_{\lambda\_{1}=0}^{1} \left[ \lambda\_{1}^{3} + \frac{\lambda\_{1}^{3}}{3} - 2\frac{\lambda\_{1}^{3}}{2} \right] \mathrm{d}\lambda\_{1} \\ &= \int\_{0}^{1} \frac{\lambda\_{1}^{3}}{3} \, \mathrm{d}\lambda\_{1} = \frac{1}{12}. \end{split} \tag{ii}$$

Let us now evaluate the constant part:

$$\begin{split} \frac{\tilde{\Gamma}\_{\mathcal{P}}(2p)}{[\tilde{\Gamma}\_{\mathcal{P}}(p)]^2} \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_{\mathcal{P}}(p)} &= \frac{\tilde{\Gamma}\_{\mathcal{2}}(4)}{[\tilde{\Gamma}\_{\mathcal{2}}(2)]^2} \frac{\pi^{2(1)}}{\tilde{\Gamma}\_{\mathcal{2}}(2)} \\ &= \frac{\pi \,\Gamma(4) \Gamma(3)}{\pi^2 [\Gamma(2) \Gamma(1)]^2} \frac{\pi^2}{\pi \,\Gamma(2) \Gamma(1)} = 12. \end{split} \tag{iii}$$

The product of *(ii)* and *(iii)* equalling 1, the solution is complete.

**Example 8.2a.2.** Consider a *p* × *p* complex matrix *X*˜ having a matrix-variate gamma density with the parameters *(α* = *p, β* = *I )*. Let *D* = diag*(λ*1*,...,λp)*, where ∞ *> λ*<sup>1</sup> *>* ··· *> λp >* 0 are the eigenvalues of *X*˜ . Denoting the density of *D* by *f*1*(D)*, we have

$$f\_1(D) = \frac{1}{\tilde{\Gamma}\_p(p)} \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} \mathbf{e}^{-(\lambda\_1 + \cdots + \lambda\_p)} \Big[ \prod\_{i$$

When *α* = *p*, this density is *the p* × *p complex matrix-variate exponential density*. Verify that *f*1*(D)* is a density for *p* = 2.

**Solution 8.2a.2.** The constant part simplifies to the following:

$$\begin{split} \frac{1}{\tilde{\Gamma}\_p(p)} \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} &= \frac{1}{\tilde{\Gamma}\_2(2)} \frac{\pi^{2(1)}}{\tilde{\Gamma}\_2(2)} \\ &= \frac{1}{\pi \Gamma(2) \Gamma(1)} \frac{\pi^2}{\pi \Gamma(2) \Gamma(1)} = 1, \end{split} \tag{i}$$

and the integrals over the *λj* 's are evaluated as follows:

\$ <sup>∞</sup> *λ*1=0 \$ *<sup>λ</sup>*<sup>1</sup> *λ*2=0 *(λ*<sup>2</sup> <sup>1</sup> <sup>+</sup> *<sup>λ</sup>*<sup>2</sup> <sup>2</sup> <sup>−</sup> <sup>2</sup>*λ*1*λ*2*)*e−*(λ*1+*λ*2*)* d*λ*<sup>1</sup> ∧ d*λ*<sup>2</sup> = \$ <sup>∞</sup> *λ*1=0 *λ*2 1e<sup>−</sup>*λ*<sup>1</sup> \$ *<sup>λ</sup>*<sup>1</sup> *λ*2=0 e−*λ*<sup>2</sup> d*λ*<sup>2</sup> d*λ*<sup>1</sup> + \$ <sup>∞</sup> *λ*1=0 e−*λ*<sup>1</sup> \$ *<sup>λ</sup>*<sup>1</sup> *λ*2=0 *λ*2 2e<sup>−</sup>*λ*<sup>2</sup> <sup>d</sup>*λ*<sup>2</sup> d*λ*<sup>1</sup> − 2 \$ <sup>∞</sup> *λ*1=0 *λ*1e<sup>−</sup>*λ*<sup>1</sup> \$ *<sup>λ</sup>*<sup>1</sup> *λ*2=0 *λ*2e<sup>−</sup>*λ*<sup>2</sup> d*λ*<sup>2</sup> d*λ*<sup>1</sup> = \$ <sup>∞</sup> 0 *(*−*λ*<sup>2</sup> <sup>1</sup> <sup>−</sup> *<sup>λ</sup>*<sup>2</sup> <sup>1</sup> <sup>−</sup> <sup>2</sup>*λ*<sup>1</sup> <sup>−</sup> <sup>2</sup> <sup>+</sup> <sup>2</sup>*λ*<sup>2</sup> <sup>1</sup> <sup>+</sup> <sup>2</sup>*λ*1*)*e−2*λ*<sup>1</sup> <sup>d</sup>*λ*<sup>1</sup> + \$ <sup>∞</sup> 0 *(λ*<sup>2</sup> <sup>1</sup> <sup>+</sup> <sup>2</sup> <sup>−</sup> <sup>2</sup>*λ*1*)*e−*λ*<sup>1</sup> <sup>d</sup>*λ*<sup>1</sup> <sup>=</sup> \$ <sup>∞</sup> 0 *(λ*<sup>2</sup> <sup>1</sup> <sup>−</sup> <sup>2</sup>*λ*<sup>1</sup> <sup>+</sup> <sup>2</sup>*)*e−*λ*<sup>1</sup> <sup>d</sup>*λ*<sup>1</sup> <sup>−</sup> <sup>2</sup> \$ <sup>∞</sup> 0 e−2*λ*<sup>1</sup> d*λ*<sup>1</sup> <sup>=</sup> *Γ (*3*)* <sup>−</sup> <sup>2</sup>*Γ (*2*)* <sup>+</sup> <sup>2</sup> <sup>−</sup> <sup>2</sup> <sup>2</sup> <sup>=</sup> <sup>1</sup>*.* (*ii*)

Now, taking the product of *(i)* and *(ii)*, we obtain 1, and the result is verified.

#### **8.2.1. Eigenvalues of matrix-variate gamma and Wishart matrices, real case**

Let *W*<sup>1</sup> and *W*<sup>2</sup> be two *p* × *p* real positive definite matrix-variate random variables that are independently distributed as matrix-variate gamma random variables with the parameters *(α*1*, B)* and *(α*2*, B),* respectively. When *αj* <sup>=</sup> *mj* <sup>2</sup> *, mj* ≥ *p, j* = 1*,* 2*,* with *<sup>m</sup>*1*, m*<sup>2</sup> <sup>=</sup> *p, p* <sup>+</sup> <sup>1</sup>*,...,* and *<sup>B</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *I* , *W*<sup>1</sup> and *W*<sup>2</sup> are independently Wishart distributed with *m*<sup>1</sup> and *m*<sup>2</sup> degrees of freedom, respectively; refer to the earlier discussion about the elimination of the scale parameter matrix *Σ>O* in a matrix-variate Wishart distribution. Consider the determinantal equation

$$|W\_1 - \lambda W\_2| = 0 \Rightarrow |W\_2^{-\frac{1}{2}} W\_1 W\_2^{-\frac{1}{2}} - \lambda I| = 0. \tag{8.2.5}$$

Thus, *<sup>λ</sup>* is an eigenvalue of *<sup>U</sup>*<sup>2</sup> <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 <sup>2</sup> . It has already been established in Theorem 8.1.1 that *<sup>U</sup>*<sup>2</sup> <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 <sup>2</sup> is distributed as a real matrix-variate type-2 beta random variable with the parameters *α*<sup>1</sup> and *α*<sup>2</sup> whose density is

$$f\_{\mu}(U\_2) = \frac{\Gamma\_p(\alpha\_1 + \alpha\_2)}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} |U\_2|^{\alpha\_1 - \frac{p+1}{2}} |I + U\_2|^{-(a\_1 + a\_2)},\tag{8.2.6}$$

for *<sup>U</sup>*<sup>2</sup> *> O, (αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,* 2*,* and zero elsewhere. Note that this distribution is free of the scale parameter matrix *B*. Writing *U*<sup>2</sup> in terms of its eigenvalues and making use of (8.2.4), we have the following result:

**Theorem 8.2.2.** *Let λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp >* 0 *be the distinct roots of the determinantal equation (8.2.5) or, equivalently, let the λj 's be the eigenvalues of <sup>U</sup>*<sup>2</sup> <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 <sup>2</sup> *, as defined in (8.2.5). Then, after integrating out over the full orthogonal group Op, the joint density of λ*1*,...,λp, denoted by g*1*(D) with D* = diag*(λ*1*,...,λp), is obtained as*

$$g\_1(D) = \frac{\Gamma\_p(\alpha\_1 + \alpha\_2)}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} \left\{ \prod\_{j=1}^p \lambda\_j^{\alpha\_1 - \frac{p+1}{2}} \right\} \left\{ \prod\_{j=1}^p (1 + \lambda\_j)^{-(\alpha\_1 + \alpha\_2)} \right\} \qquad (8.2.7)$$

$$\times \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \left\{ \prod\_{i$$

*Proof***:** Applying the transformation *U*<sup>2</sup> = *PDP*- *, PP*- = *I, P*- *P* = *I* , where *P* is a unique orthonormal matrix, to the density of *U*<sup>2</sup> given in (8.2.6), it follows from Theorem 8.2.1 that

$$\mathrm{d}U\_2 = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \Big| \prod\_{i$$

after integrating out the differential element corresponding to the orthonormal matrix *P*. On substituting |*U*2| = *λ*<sup>1</sup> ··· *λp* and |*I* + *U*2| = *(*1 + *λ*1*)*···*(*1 + *λp)* in (8.2.6), the result is established.

**Note 8.2.1.** When *<sup>α</sup>*<sup>1</sup> <sup>=</sup> *<sup>m</sup>*<sup>1</sup> <sup>2</sup> *, α*<sup>2</sup> <sup>=</sup> *<sup>m</sup>*<sup>2</sup> <sup>2</sup> *, mj* ≥ *p,* with *m*1*, m*<sup>2</sup> = *p, p* + 1*,...,* in Theorem 8.2.2, we have the corresponding result for real Wishart matrices having *m*<sup>1</sup> and *m*<sup>2</sup> degrees of freedom and parameter matrix <sup>1</sup> <sup>2</sup> *Ip*.

**Example 8.2.4.** Let the *p*×*p* real matrix *X* have a real matrix-variate type-2 beta density with the parameters *<sup>α</sup>* <sup>=</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> and *<sup>β</sup>* <sup>=</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> . Then, the joint density of its eigenvalues *λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp >* 0*,* or that of *D* = diag*(λ*1*,...,λp)*, denoted by *g*1*(D)*, is

$$g\_1(D) = \frac{\Gamma\_p(p+1)}{[\Gamma\_p(\frac{p+1}{2})]^2} \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \left[\prod\_{j=1}^p (1+\lambda\_j)^{-(p+1)}\right] \prod\_{i$$

Verify that *g*1*(D)* is a density for *p* = 2.

**Solution 8.2.4.** Consider the total integral for *p* = 2. The constant part is

$$\frac{\Gamma\_p(p+1)}{[\Gamma\_p(\frac{p+1}{2})]^2} \frac{\pi^{\frac{p}{2}}}{\Gamma\_p(\frac{p}{2})} = \frac{\Gamma\_2(3)}{[\Gamma\_2(\frac{3}{2})]^2} \frac{\pi^2}{\Gamma\_2(\frac{2}{2})} = \frac{\sqrt{\pi}\Gamma(3)\Gamma(\frac{5}{2})}{\pi[\Gamma(\frac{3}{2})\Gamma(1)]^2} \frac{\pi^2}{\sqrt{\pi}\,\Gamma(1)\Gamma(\frac{1}{2})}$$

$$= \frac{\sqrt{\pi}(2)(\frac{3}{2})(\frac{1}{2})\sqrt{\pi}}{\pi(\frac{1}{4})\pi} \frac{\pi^2}{\sqrt{\pi}\sqrt{\pi}} = 6,\tag{i}$$

and the integral part is obtained as follows:

$$\begin{split} \int\_{\lambda\_{1}=0}^{\infty} \int\_{\lambda\_{2}=0}^{\lambda\_{1}} \frac{(\lambda\_{1}-\lambda\_{2})}{(1+\lambda\_{1})^{3}(1+\lambda\_{2})^{3}} \mathrm{d}\lambda\_{1} \wedge \mathrm{d}\lambda\_{2} \\ = \int\_{\lambda\_{1}=0}^{\infty} \frac{\lambda\_{1}}{(1+\lambda\_{1})^{3}} \left[ \int\_{\lambda\_{2}=0}^{\lambda\_{1}} \frac{1}{(1+\lambda\_{2})^{3}} \mathrm{d}\lambda\_{2} \right] \mathrm{d}\lambda\_{1} \\ - \int\_{\lambda\_{1}=0}^{\infty} \frac{1}{(1+\lambda\_{1})^{3}} \left[ \int\_{\lambda\_{2}=0}^{\lambda\_{1}} \frac{\lambda\_{2}}{(1+\lambda\_{2})^{3}} \mathrm{d}\lambda\_{2} \right] \mathrm{d}\lambda\_{1} . \end{split} \tag{ii}$$

The first integral over *λ*<sup>2</sup> in *(ii)* is

$$\int\_{\lambda\_2=0}^{\lambda\_1} \frac{1}{(1+\lambda\_2)^3} d\lambda\_2 = \frac{1}{2} \left[ 1 - \frac{1}{(1+\lambda\_1)^2} \right];$$

then, integrating with respect to *λ*<sup>1</sup> yields

$$\begin{split} \frac{1}{2} \int\_{\lambda\_1=0}^{\infty} \frac{\lambda\_1}{(1+\lambda\_1)^3} \left[ 1 - \frac{1}{(1+\lambda\_1)^2} \right] d\lambda\_1 &= \frac{1}{2} \Big\{ \frac{\Gamma(2)\Gamma(1)}{\Gamma(3)} - \frac{\Gamma(2)\Gamma(3)}{\Gamma(5)} \Big\} \\ &= \frac{1}{2} \Big\{ \frac{1}{2} - \frac{1}{12} \Big\} = \frac{5}{24}. \end{split} \tag{iii}$$

Now, after integrating by parts, the second integral over *λ*<sup>2</sup> in *(ii)* is the following:

$$\int\_{\lambda\_2=0}^{\lambda\_1} \frac{\lambda\_2}{(1+\lambda\_2)^3} \mathrm{d}\lambda\_2 = \frac{1}{2} \left[ -\frac{\lambda\_1}{(1+\lambda\_1)^2} + 1 - \frac{1}{1+\lambda\_1} \right];$$

then, integrating with respect to *λ*<sup>1</sup> gives

$$\begin{split} -\frac{1}{2} \int\_{\lambda\_{1}=0}^{\infty} \frac{1}{(1+\lambda\_{1})^{3}} \left[ -\frac{\lambda\_{1}}{(1+\lambda\_{1})^{2}} + 1 - \frac{1}{1+\lambda\_{1}} \right] d\lambda\_{1} \\ = \frac{1}{2} \Big[ \frac{\Gamma(2)\Gamma(3)}{\Gamma(5)} - \frac{1}{2} + \frac{1}{3} \Big] = -\frac{1}{24}. \end{split} \tag{i\upsilon}$$

Combining *(iii)* and *(iv)*, the sum is

$$
\frac{\mathfrak{F}}{24} - \frac{1}{24} = \frac{1}{6}.\tag{v}
$$

Finally, the product of *(i)* and *(v)* is 1, which verifies that *f (D)* is indeed a density when *p* = 2.

#### **8.2a.1. Eigenvalues of complex matrix-variate gamma and Wishart matrices**

A parallel result can be obtained in the complex domain. Let *W*˜ <sup>1</sup> and *W*˜ <sup>2</sup> be independently distributed *p*×*p* complex matrix-variate gamma random variables with parameters *(α*1*, B)*˜ and *(α*2*, B),* ˜ *B*˜ = *B*˜ <sup>∗</sup> *> O, (αj )>p* − 1*, j* = 1*,* 2. Consider the determinantal equation

$$\mathbb{E}\left[\det(\tilde{W}\_1 - \lambda \tilde{W}\_2)\right] = 0 \implies \left[\det(\tilde{W}\_2^{-\frac{1}{2}} \tilde{W}\_1 \tilde{W}\_2^{-\frac{1}{2}} - \lambda I)\right] = 0. \tag{8.2a.3}$$

It follows from Theorem 8.1a.1 that *<sup>U</sup>*˜<sup>2</sup> <sup>=</sup> *<sup>W</sup>*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*˜ <sup>1</sup>*W*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> has a complex matrix-variate type-2 beta distribution with the parameters *(α*1*, α*2*)*, whose associated density is

$$\tilde{f}\_{\mu}(\tilde{U}\_{2}) = \frac{\Gamma\_{p}(\alpha\_{1} + \alpha\_{2})}{\tilde{\Gamma}\_{p}(\alpha\_{1})\tilde{\Gamma}\_{p}(\alpha\_{2})} |\det(\tilde{U}\_{2})|^{\alpha\_{1} - p} |\det(I + \tilde{U}\_{2})|^{-(\alpha\_{1} + \alpha\_{2})} \tag{8.2a.4}$$

for *U*˜<sup>2</sup> = *U*˜ <sup>∗</sup> <sup>2</sup> *> O, (αj )>p* − 1*, j* = 1*,* 2, and zero elsewhere. Observe that the distribution of *U*˜<sup>2</sup> is free of the scale parameter matrix *B>O* ˜ and that *W*˜ <sup>1</sup> and *W*˜ <sup>2</sup> are Hermitian positive definite so that their eigenvalues *λ*<sup>1</sup> *>* ··· *> λp >* 0, assumed to be distinct, are real and positive. Writing *U*˜<sup>2</sup> in terms of its eigenvalues and making use of (8.2a.1), we have the following result:

**Theorem 8.2a.2.** *Let <sup>U</sup>*˜<sup>2</sup> <sup>=</sup> *<sup>W</sup>*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*˜ <sup>1</sup>*W*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *and its distinct eigenvalues λ*<sup>1</sup> *>* ··· *> λp >* 0 *be as defined in the determinantal equation (8.2a.3). Then, after integrating out the differential element corresponding to the unique unitary matrix Q*˜ *, Q*˜ *Q*˜ <sup>∗</sup> = *I, Q*˜ <sup>∗</sup>*Q*˜ = *I , such that U*˜<sup>2</sup> = *QD*˜ *Q*˜ <sup>∗</sup>*, with D* = diag*(λ*1*,...,λp), the joint density of λ*1*,...,λp, denoted by g*˜1*(D)*˜ *, is obtained as*

$$
\tilde{\mathbf{g}}\_1(D)\,\mathrm{d}D = \frac{\tilde{\Gamma}\_p(\alpha\_1 + \alpha\_2)}{\tilde{\Gamma}\_p(\alpha\_1)\tilde{\Gamma}\_p(\alpha\_2)} \Big[ \prod\_{j=1}^p \lambda\_j^{\alpha\_1 - p} \Big] \Big[ \prod\_{j=1}^p (1 + \lambda\_j) \Big]^{-(\alpha\_1 + \alpha\_2)} \tag{8.2a.5}
$$

$$
\times \left[ \prod\_{i$$

*where*

$$\tilde{\Gamma}\_p(\alpha) = \pi^{\frac{p(p-1)}{2}} \Gamma(\alpha) \Gamma(\alpha - 1) \cdots \Gamma(\alpha - p + 1), \; \Re(\alpha) > p - 1. \tag{8.2a.6}$$

**Example 8.2a.3.** Let the *p* × *p* matrix *X*˜ have a complex matrix-variate type-2 beta density with the parameters *(α* = *p, β* = *p)*. Let its eigenvalues be *λ*<sup>1</sup> *>* ··· *> λp >* 0 and their joint density be denoted by *g*˜1*(D)*, *D* = diag*(λ*1*,...,λp)*. Then,

$$\tilde{g\_1}(D) = \frac{\tilde{\Gamma}\_p(2p)}{[\tilde{\Gamma}\_p(p)]^2} \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} \left[ \prod\_{j=1}^p \frac{1}{(1+\lambda\_j)^{2p}} \right] \prod\_{i$$

Verify that *g*˜1*(D)* is a density for *p* = 2.

**Solution 8.2a.3.** Since the total integral must be unity, let us integrate out the *λj* 's. The constant part is the following:

$$\begin{split} \frac{\tilde{\Gamma}\_p(2p)}{[\tilde{\Gamma}\_p(p)]^2} \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} &= \frac{\tilde{\Gamma}\_2(4)}{[\tilde{\Gamma}\_2(2)]^2} \frac{\pi^2}{\tilde{\Gamma}\_2(2)}\\ &= \frac{\pi \,\Gamma(4) \,\Gamma(3)}{\pi^2} \frac{\pi^2}{\pi} = 12. \end{split} \tag{i}$$

Now, consider the integrals over *<sup>λ</sup>*<sup>1</sup> and *<sup>λ</sup>*2, noting that *(λ*<sup>1</sup> <sup>−</sup> *<sup>λ</sup>*2*)*<sup>2</sup> <sup>=</sup> *<sup>λ</sup>*<sup>2</sup> <sup>1</sup> <sup>+</sup> *<sup>λ</sup>*<sup>2</sup> <sup>2</sup> − 2*λ*1*λ*2:

$$\begin{split} \int\_{\lambda\_{1}=0}^{\infty} \int\_{\lambda\_{2}=0}^{\lambda\_{1}} \frac{[\lambda\_{1}^{2}+\lambda\_{2}^{2}-2\lambda\_{1}\lambda\_{2}]}{(1+\lambda\_{1})^{4}(1+\lambda\_{2})^{4}} \mathrm{d}\lambda\_{1}\wedge\mathrm{d}\lambda\_{2} \\ = \int\_{\lambda\_{1}=0}^{\infty} \frac{\lambda\_{1}^{2}}{(1+\lambda\_{1})^{4}} \left[\int\_{\lambda\_{2}=0}^{\lambda\_{1}} \frac{1}{(1+\lambda\_{2})^{4}} \mathrm{d}\lambda\_{2}\right] \mathrm{d}\lambda\_{1} \\ + \int\_{\lambda\_{1}=0}^{\infty} \frac{1}{(1+\lambda\_{1})^{4}} \left[\int\_{\lambda\_{2}=0}^{\lambda\_{1}} \frac{\lambda\_{2}^{2}}{(1+\lambda\_{2})^{4}} \mathrm{d}\lambda\_{2}\right] \mathrm{d}\lambda\_{1} \\ - 2\int\_{\lambda\_{1}=0}^{\infty} \frac{\lambda\_{1}}{(1+\lambda\_{1})^{4}} \left[\int\_{\lambda\_{2}=0}^{\lambda\_{1}} \frac{\lambda\_{2}}{(1+\lambda\_{2})^{4}} \mathrm{d}\lambda\_{2}\right] \mathrm{d}\lambda\_{1} . \end{split} \tag{ii}$$

As they appear in *(ii)*, the integrals over *λ*<sup>2</sup> are

$$\int\_{\lambda\_2=0}^{\lambda\_1} \frac{1}{(1+\lambda\_2)^4} d\lambda\_2 = \frac{1}{3} \left[ 1 - \frac{1}{(1+\lambda\_1)^3} \right],\tag{iiii}$$

The Distributions of Eigenvalues and Eigenvectors

$$\int\_{\lambda\_2=0}^{\lambda\_1} \frac{\lambda\_2^2}{(1+\lambda\_2)^4} d\lambda\_2 = \frac{1}{3} \left[ -\frac{\lambda\_1^2}{(1+\lambda\_1)^3} - \frac{\lambda\_1}{(1+\lambda\_1)^2} + 1 - \frac{1}{1+\lambda\_1} \right], \qquad (\dot{\nu})\_1$$

$$\int\_{\lambda\_2=0}^{\lambda\_1} \frac{\lambda\_2}{(1+\lambda\_2)^4} d\lambda\_2 = \frac{1}{3} \left[ -\frac{\lambda\_1}{(1+\lambda\_1)^3} + \frac{1}{2} - \frac{1}{2} \frac{1}{(1+\lambda\_1)^2} \right];\tag{\nu}$$

then, integrating with respect to *λ*<sup>1</sup> yields

$$\begin{split} \int\_{\lambda\_1=0}^{\infty} \frac{\lambda\_1^2}{(1+\lambda\_1)^4} \Big[ \frac{1}{3} \Big( 1 - \frac{1}{(1+\lambda\_1)^3} \Big) \Big] d\lambda\_1 \\ = \frac{1}{3} \Big[ \frac{\Gamma(3)\Gamma(1)}{\Gamma(4)} - \frac{\Gamma(3)\Gamma(4)}{\Gamma(7)} \Big], \end{split} \tag{\text{v}i}$$

$$\begin{split} &\int\_{\lambda\_{1}=0}^{\infty} \frac{1}{(1+\lambda\_{1})^{4}} \Big( \frac{1}{3} \Big) \Big[ -\frac{\lambda\_{1}^{2}}{(1+\lambda\_{1})^{3}} - \frac{\lambda\_{1}}{(1+\lambda\_{1})^{2}} + 1 - \frac{1}{1+\lambda\_{1}} \Big] d\lambda\_{1} \\ &= -\frac{1}{3} \Big[ \frac{\Gamma(3)\Gamma(4)}{\Gamma(7)} + \frac{\Gamma(2)\Gamma(4)}{\Gamma(6)} - \frac{\Gamma(1)\Gamma(3)}{\Gamma(4)} + \frac{\Gamma(1)\Gamma(4)}{\Gamma(5)} \Big], \end{split} \tag{\text{v} ii}$$

$$\begin{split} & -2\int\_{\lambda\_1=0}^{\infty} \frac{\lambda\_1}{(1+\lambda\_1)^4} \left(\frac{1}{3}\right) \left[ -\frac{\lambda\_1}{(1+\lambda\_1)^3} + \frac{1}{2} - \frac{1}{2} \frac{1}{(1+\lambda\_1)^2} \right] d\lambda\_1 \\ & = \frac{1}{3} \Big[ 2\frac{\Gamma(3)\Gamma(4)}{\Gamma(7)} - \frac{\Gamma(2)\Gamma(2)}{\Gamma(4)} + \frac{\Gamma(2)\Gamma(4)}{\Gamma(6)} \Big]. \end{split} \tag{\text{viiii}}$$

Summing (*vi*),(*vii*) and *(viii)*, we have

$$\frac{1}{3} \left[ 2 \frac{\Gamma(3)\Gamma(1)}{\Gamma(4)} - \frac{\Gamma(1)\Gamma(4)}{\Gamma(5)} - \frac{\Gamma(2)\Gamma(2)}{\Gamma(4)} \right] = \frac{1}{3} \left[ \frac{4}{3!} - \frac{1}{4} - \frac{1}{3!} \right] = \frac{1}{12}.\qquad(\text{ix})$$

As the product of *(ix)* and *(i)* is 1, the result is established. Note that since 2*p* is a positive integer, the method of integration by parts works for a general *p* when the first parameter in the type-2 beta density *α* is equal to *p*. However, - *i<j (λi* − *λj )* will be difficult to handle for a general *p*.

**Example 8.2a.4.** Give an explicit representation of (8.2a.5) for *p* = 3*, α*<sup>1</sup> = 4 and *α*<sup>2</sup> = 3.

**Solution 8.2a.4.** For *p* = 3*, α*<sup>1</sup> −*p* = 4−3 = 1*, α*<sup>1</sup> +*α*<sup>2</sup> = 4+3 = 7*, p(p* −1*)* = 6.

The constant part is the following:

$$\begin{split} \frac{\tilde{\Gamma}\_p(\alpha\_1 + \alpha\_2)}{\tilde{\Gamma}\_p(\alpha\_1)\tilde{\Gamma}\_p(\alpha\_2)} \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} &= \frac{\tilde{\Gamma}\_3(7)}{\tilde{\Gamma}\_3(4)\tilde{\Gamma}\_3(3)} \frac{\pi^{3(2)}}{\tilde{\Gamma}\_3(3)} \\ &= \frac{\Gamma(7)\Gamma(6)\Gamma(5)}{\pi^3[\Gamma(4)\Gamma(3)\Gamma(2)][\Gamma(3)\Gamma(2)\Gamma(1)]} \\ &\times \frac{\pi^6}{\pi^3\Gamma(3)\Gamma(2)\Gamma(1)} = 43200, \end{split} \tag{1}$$

the functional part being the product of

$$\left(\prod\_{j=1}^{p} \lambda\_j^{a\_1 - p}\right) \left(\prod\_{j=1}^{p} (1 + \lambda\_j)^{-(a\_1 + a\_2)}\right) = (\lambda\_1 \lambda\_2 \lambda\_3) [(1 + \lambda\_1)(1 + \lambda\_2)(1 + \lambda\_3)]^{-7} \tag{ii}$$

and

$$\prod\_{i$$

Multiplying (*i*), (*ii*) and *(iii)* yields the answer.

#### **8.2.2. An alternative procedure in the real case**

This section describes an alternative procedure that is presented in Anderson (2003). The real case will first be discussed. Let *W*<sup>1</sup> and *W*<sup>2</sup> be independently distributed real *p*×*p* matrix-variate gamma random variables with the parameters *(α*1*, B), (α*2*, B), B > O, (αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,* 2. We are considering the determinantal equation

$$\begin{aligned} |W\_1 - \lambda W\_2| &= 0 \Rightarrow |W\_1 - \mu(W\_1 + W\_2)| = 0\\ &\Rightarrow |W\_1 - \frac{\mu}{1 - \mu} W\_2| = 0\\ &\Rightarrow |(W\_1 + W\_2)^{-\frac{1}{2}} W\_1 (W\_1 + W\_2)^{-\frac{1}{2}} - \mu I| = 0 \end{aligned} \tag{8.2.8}$$

where *<sup>λ</sup>* <sup>=</sup> *<sup>μ</sup>* <sup>1</sup>−*μ*. Thus, *<sup>μ</sup>* is an eigenvalue of *<sup>U</sup>*<sup>1</sup> <sup>=</sup> *(W*<sup>1</sup> <sup>+</sup> *<sup>W</sup>*2*)* −1 <sup>2</sup>*W*1*(W*<sup>1</sup> + *W*2*)* −1 <sup>2</sup> . It follows from Theorem 8.1.1 that he joint density of *W*<sup>1</sup> and *W*2, denoted by *f (W*1*, W*2*)*, can be written as

$$f(W\_1, W\_2) = g(U\_1)g\_1(U\_3) \tag{8.2.9}$$

where *U*<sup>1</sup> = *(W*<sup>1</sup> + *W*2*)* −1 <sup>2</sup>*W*1*(W*<sup>1</sup> + *W*2*)* −1 <sup>2</sup> and *U*<sup>3</sup> = *W*<sup>1</sup> + *W*<sup>2</sup> are independently distributed. Further,

$$g(U\_1) = \frac{\Gamma\_p(\alpha\_1 + \alpha\_2)}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} |U\_1|^{\alpha\_1 - \frac{p+1}{2}} |I - U\_1|^{\alpha\_2 - \frac{p+1}{2}}, \ \ \ \ 0 < U\_1 < I,\tag{8.2.10}$$

for *(αj ) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *, j* = 1*,* 2, and *g(U*1*)* = 0 elsewhere, is a real matrix-variate type-1 beta density, and

$$\log\_1(U\_3) = \frac{|B|^{a\_1 + a\_2}}{\Gamma\_p(a\_1 + a\_2)} |U\_3|^{a\_1 + a\_2 - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(B \, U\_3)} \tag{8.2.11}$$

for *B > O, (α*1+*α*2*) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *,* and *g*1*(U*3*)* = 0 elsewhere, is a real matrix-variate gamma density with the parameters *(α*<sup>1</sup> + *α*2*, B), B > O*. Now, consider the transformation *U*<sup>1</sup> = *PDP*- *, PP*- = *I, P*- *P* = *I* , where *D* = diag*(μ*1*,...,μp), μ*<sup>1</sup> *>* ··· *> μp >* 0 being the distinct eigenvalues of the real positive definite matrix *U*1, and the orthonormal matrix *P* is unique. Given the density of *U*<sup>1</sup> specified in (8.2.10), the joint density of *μ*1*,...,μp*, denoted by *g*4*(D)*, which is obtained after integrating out the differential element corresponding to *P*, is

$$g\_4(D)\mathrm{d}D = \frac{\Gamma\_p(\alpha\_1 + \alpha\_2)}{\Gamma\_p(\alpha\_1)\Gamma\_p(\alpha\_2)} \Big[\prod\_{j=1}^p \mu\_j^{\alpha\_1 - \frac{p+1}{2}}\Big] \Big[\prod\_{j=1}^p (1 - \mu\_j)^{\alpha\_2 - \frac{p+1}{2}}\Big]$$

$$\times \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \Big[\prod\_{i$$

Hence, the following result:

**Theorem 8.2.3.** *The joint density of the eigenvalues μ*<sup>1</sup> *>* ··· *> μp >* 0 *of the determinantal equation in (8.2.8) is given by the expression appearing in (8.2.12), which is equal to the density specified in (8.2.7).*

*Proof***:** It has already been established in Theorem 8.1.1 that *U*<sup>1</sup> = *(W*1+*W*2*)* −1 <sup>2</sup>*W*<sup>1</sup> *(W*1+ *W*2*)* −1 <sup>2</sup> has the real matrix-variate type-1 beta density given in (8.2.10). Now, make the transformation *U*<sup>1</sup> = *PDP* where *D* = diag*(μ*1*,...,μp)* and the orthonormal matrix, *P* is unique. Then, the first part is established from Theorem 8.2.2. It follows from (8.2.8) that *<sup>λ</sup>* <sup>=</sup> *<sup>μ</sup>* 1−*μ* or *<sup>μ</sup>* <sup>=</sup> *<sup>λ</sup>* <sup>1</sup>+*<sup>λ</sup>* with d*<sup>μ</sup>* <sup>=</sup> <sup>1</sup> *(*1+*λ)*<sup>2</sup> <sup>d</sup>*<sup>λ</sup>* and *<sup>μ</sup>* <sup>=</sup> <sup>1</sup><sup>−</sup> <sup>1</sup> <sup>1</sup>+*λ*. Observe that - *i<j (μi* − *μj )* = - *i<j (λi*−*λj ) (*1+*λi)(*1+*λj )* and that, in this product's denominator, 1+*λi* appears *<sup>p</sup>*−1 times for *<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,...,p*. The exponent of <sup>1</sup> 1+*λi* is *<sup>α</sup>*<sup>1</sup> <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> <sup>+</sup> *<sup>α</sup>*<sup>2</sup> <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> + 2 + *(p* − 1*)* = *α*<sup>1</sup> + *α*2. On substituting these values in (8.2.12), a perfect agreement with (8.2.7) is established, which completes the proof.

**Example 8.2.5.** Provide an explicit representation of (8.2.12) for *p* = 3*, α*<sup>1</sup> = 4 and *α*<sup>2</sup> = 3.

**Solution 8.2.5.** Note that *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> <sup>=</sup> <sup>3</sup>+<sup>1</sup> <sup>2</sup> <sup>=</sup> 2, *<sup>α</sup>*<sup>1</sup> <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> <sup>=</sup> <sup>4</sup> <sup>−</sup> <sup>2</sup> <sup>=</sup> 2 and *<sup>α</sup>*<sup>2</sup> <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> 2 = 3 − 2 = 1. The constant part is

$$\begin{split} \frac{\Gamma\_{p}(\alpha\_{1}+\alpha\_{2})}{\Gamma\_{p}(\alpha\_{1})\Gamma\_{p}(\alpha\_{2})} \frac{\pi^{\frac{p^{2}}{2}}}{\Gamma\_{p}(\frac{p}{2})} &= \frac{\Gamma\_{3}(7)}{\Gamma\_{3}(4)\Gamma\_{3}(3)} \frac{\pi^{\frac{9}{2}}}{\Gamma\_{3}(\frac{3}{2})} \\ &= \frac{(6!)(\mathfrak{S}!((\frac{11}{2})(\frac{9}{2})(\frac{7}{2})(\frac{5}{2})(\frac{3}{2})(\frac{1}{2})\sqrt{\pi}}}{\pi^{\frac{3}{2}}(\mathfrak{S}!)(2!)(1!)(\frac{5}{2})(\frac{3}{2})(\frac{1}{2})\sqrt{\pi}(\frac{3}{2})2!(\frac{1}{2})\sqrt{\pi}1!} \\ &\qquad\times \frac{\pi^{\frac{9}{2}}}{\pi^{\frac{3}{2}}(\frac{1}{2})\sqrt{\pi}\sqrt{\pi}} = 831600,\end{split}$$

the functional part being the product of

$$\left(\prod\_{j=1}^p \mu\_j^{\alpha\_1 - \frac{p+1}{2}}\right) \left(\prod\_{j=1}^p (1-\mu\_j)^{\alpha\_2 - \frac{p+1}{2}}\right) = (\mu\_1 \mu\_2 \mu\_3)^2 [(1-\mu\_1)(1-\mu\_2)(1-\mu\_3)]^1 \quad \text{(ii)}$$

and

$$\prod\_{i$$

The product of (*i*), (*ii*) and *(iii)* yields the answer.

#### **8.2.3. The joint density of the eigenvectors in the real case**

In order to establish the joint density of the eigenvectors, we will proceed as follows, our starting equation being |*W*<sup>1</sup> − *λW*2| = 0. Let *λj* be a root of this equation and let *Yj* be the corresponding vector. Then,

$$W\_1 Y\_j = \lambda\_j W\_2 Y\_j \Rightarrow (W\_1 + \lambda\_j W\_1) Y\_j = \lambda\_j (W\_1 + W\_2) Y\_j \tag{8.2.13}$$

$$\Rightarrow W\_{\mathbf{l}} Y\_{\mathbf{j}} = \frac{\lambda\_j}{1 + \lambda\_j} (W\_{\mathbf{l}} + W\_2) Y\_{\mathbf{j}} = \mu\_j (W\_{\mathbf{l}} + W\_2) Y\_{\mathbf{j}} \tag{i}$$

$$\Rightarrow \left(W\_1 + W\_2\right)^{-1} W\_1 Y\_{\bar{j}} = \mu\_{\bar{j}} Y\_{\bar{j}}.\tag{ii}$$

This shows that *Yj* is the eigenvector corresponding to the eigenvalue *μj* of *(W*<sup>1</sup> + *<sup>W</sup>*2*)*−1*W*<sup>1</sup> or, equivalently, of *(W*<sup>1</sup> <sup>+</sup> *<sup>W</sup>*2*)* −1 <sup>2</sup>*W*1*(W*<sup>1</sup> + *W*2*)* −1 <sup>2</sup> = *U*<sup>1</sup> which is a real matrixvariate type-1 beta random variable. Since the *μj* 's are distinct, *μ*<sup>1</sup> *>* ··· *> μp >* 0 and the matrix is symmetric, the eigenvectors *Y*1*,...,Yp* are mutually orthogonal. Consider the equation

$$W\_1 Y\_j = \mu\_j (W\_1 + W\_2) Y\_j. \tag{iiii}$$

For *i* = *j* , we also have

$$W\_1 Y\_i = \mu\_i (W\_1 + W\_2) Y\_i. \tag{i\nu}$$

Premultiplying *(iii)* by *Y* - *<sup>i</sup>* and *(iv)* by *Y* - *<sup>j</sup>* , and observing that *W*- <sup>1</sup> = *W*1, it follows that *(Y* - *<sup>i</sup> W*1*Yj )*- = *Y* - *<sup>j</sup>W*1*Yi*. Since both are real 1 × 1 matrices and one is the transpose of the other, they are equal. Then, on subtracting the resulting right-hand sides, we obtain

$$0 = \mu\_j Y\_i'(W\_1 + W\_2)Y\_j - \mu\_i Y\_j'(W\_1 + W\_2)Y\_i = (\mu\_j - \mu\_i)Y\_i'(W\_1 + W\_2)Y\_j.$$

Since *Y* - *<sup>i</sup>(W*<sup>1</sup> +*W*2*)Yj* = *Y* - *<sup>j</sup> (W*<sup>1</sup> +*W*2*)Yi* by the previous argument and *μi* = *μj* , we must have

$$Y\_i^{\prime}(W\_1 + W\_2)Y\_j = 0 \text{ for all } i \neq j. \tag{\nu}$$

Let us normalize *Yj* as follows:

$$Y\_j^{\prime}(W\_1 + W\_2)Y\_j = 1, \; j = 1, 2, \ldots, p. \tag{\text{v}l}$$

Then, combining *(v)* and *(vi)*, we have

$$Y'(W\_1 + W\_2)Y = I, \ Y = (Y\_1, \dots, Y\_p), \tag{\text{vii}}$$

which is the *p* × *p* matrix of the normalized eigenvectors. Thus,

$$\begin{aligned} W\_1 + W\_2 &= (Y')^{-1} Y^{-1} = Z'Z, \; Z = Y^{-1} \\ \Rightarrow \mathbf{d}Y &= |Z|^{-2p} \mathbf{d}Z \text{ or } \mathbf{d}Z = |Y|^{-2p} \mathbf{d}Y, \end{aligned}$$

which follows from an application of Theorem 1.6.6. We are seeking the joint density of *Y*1*,...,Yp* or the density of *Y* . The density of *W*<sup>1</sup> + *W*<sup>2</sup> = *U*<sup>3</sup> denoted by *g*1*(U*3*)*, is available from (8.2.11) as

$$\begin{split} \mathbf{g}\_{1}(U\_{3})\mathbf{d}U\_{3} &= \frac{|B|^{\alpha\_{1}+\alpha\_{2}}}{\Gamma\_{p}(\alpha\_{1}+\alpha\_{2})}|U\_{3}|^{\alpha\_{1}+\alpha\_{2}-\frac{p+1}{2}}\mathbf{e}^{-\text{tr}(B\,U\_{3})}\mathbf{d}U\_{3} \\ &= \frac{|B|^{\alpha\_{1}+\alpha\_{2}}}{\Gamma\_{p}(\alpha\_{1}+\alpha\_{2})}|Z'Z|^{\alpha\_{1}+\alpha\_{2}-\frac{p+1}{2}}\mathbf{e}^{-\text{tr}(BZZ')}\mathbf{d}(Z'Z). \end{split} \tag{8.2.14}$$

Letting *U*<sup>3</sup> = *Z*- *Z*, ascertain the connection between the differential elements d*Z* and d*U*<sup>3</sup> from Theorem 4.2.3 for the case *q* = *p*. Then,

$$\mathrm{d}Z = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} |U\_3|^{\frac{p}{2} - \frac{p+1}{2}} \mathrm{d}U\_3 \Rightarrow \mathrm{d}U\_3 = \frac{\Gamma\_p(\frac{p}{2})}{\pi^{\frac{p^2}{2}}} |Z'Z|^{\frac{1}{2}} \mathrm{d}Z.$$

Hence, the density of *Z*, denoted by *gz(Z)*, is the following:

$$\operatorname{g}\_{\varepsilon}(Z)\operatorname{d}Z = \frac{|B|^{\alpha\_1+\alpha\_2}}{\Gamma\_p(\alpha\_1+\alpha\_2)}\frac{\Gamma\_p(\frac{p}{2})}{\pi^{\frac{p^2}{2}}}|Z'Z|^{\alpha\_1+\alpha\_2-\frac{p}{2}}\mathbf{e}^{-\operatorname{tr}(BZ'Z)}\operatorname{d}Z,$$

so that the density of *<sup>Y</sup>* <sup>=</sup> *<sup>Z</sup>*−1, denoted by *<sup>g</sup>*5*(Y )*, is given by

$$\begin{split} g\_{\mathfrak{S}}(Y) \, \mathrm{d}Y &= \frac{|B|^{a\_1 + a\_2}}{\Gamma\_p(\alpha\_1 + \alpha\_2)} |Y|^{-2p} |YY'|^{-(a\_1 + a\_2) + \frac{p}{2}} \mathrm{e}^{-\mathrm{tr}(B(YY')^{-1})} \mathrm{d}Y \\ &= \begin{cases} \frac{|B|^{a\_1 + a\_2}}{\Gamma\_p(\alpha\_1 + \alpha\_2)} |YY'|^{-(a\_1 + a\_2 + \frac{p}{2})} \mathrm{e}^{-\mathrm{tr}(B(YY')^{-1})} \mathrm{d}Y \\ 0, \text{ elsewhere.} \end{cases} \\ \end{split} \tag{8.2.15}$$

Then, we have the following result:

**Theorem 8.2.4.** *Let W*<sup>1</sup> *and W*<sup>2</sup> *be independently distributed p* × *p real matrix-variate gamma random variables with parameters (α*1*, B), (α*2*, B), B > O, (αj ) > p*−1 <sup>2</sup> *, j* = 1*,* 2*. Consider the equation*

$$|W\_1 - \lambda W\_2| = 0 \Rightarrow W\_1 Y\_j = \lambda\_j W\_2 Y\_j, \ j = 1, \dots, p, q$$

*where Yj is a vector corresponding to the root λj of the determinantal equation. Let λ*<sup>1</sup> *>* ··· *> λp >* 0 *be its distinct roots, which are also the eigenvalues of the matrix <sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 <sup>2</sup> *. The eigenvalues λ*1*,...,λp and the linearly independent orthogonal eigenvectors Y*1*,...,Yp are independently distributed. The joint density of the eigenvalues λ*1*,...,λp is available from Theorem 8.2.2 and the joint density of the eigenvectors is given in (8.2.15).*

**Example 8.2.6.** Illustrate the steps to show that the solutions of the determinantal equation <sup>|</sup>*W*<sup>1</sup> <sup>−</sup> *λW*2| = 0 are also the eigenvalues of *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 <sup>2</sup> for the following matrices:

$$W\_1 = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix}, \ W\_2 = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}.$$

**Solution 8.2.6.** First, let us assess whether *W*<sup>1</sup> and *W*<sup>2</sup> are positive definite matrices. Clearly, *W*<sup>1</sup> = *W*- <sup>1</sup> and *W*<sup>2</sup> = *W*- <sup>2</sup> are symmetric and their leading minors are positive:

$$|\langle 2 \rangle| = 2 > 0, \quad \begin{vmatrix} 2 & 1 \\ 1 & 2 \end{vmatrix} = 3 > 0 \Rightarrow W\_2 > O; \ |\langle 3 \rangle| > 0, \ \begin{vmatrix} 3 & 1 \\ 1 & 3 \end{vmatrix} = 8 > 0 \Rightarrow W\_1 > O.I$$

Consider the determinantal equation |*W*<sup>1</sup> − *λW*2| = 0, that is,

$$
\left| \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix} - \lambda \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \right| = (3 - 2\lambda)^2 - (1 - \lambda)^2 = 0,\tag{i}
$$

whose roots are *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>2</sup>*, λ*<sup>2</sup> <sup>=</sup> <sup>4</sup> <sup>3</sup> . Let us determine *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> . To this end, let us evaluate the eigenvalues and eigenvectors of *W*2. The eigenvalues of *W*<sup>2</sup> are the values of *ν* satisfying the equation <sup>|</sup>*W*2−*νI* | = <sup>0</sup> <sup>⇒</sup> *(*2−*ν)*2−12 <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *<sup>ν</sup>*<sup>1</sup> <sup>=</sup> 3 and *<sup>ν</sup>*<sup>2</sup> <sup>=</sup> 1 are the eigenvalues of *W*2. An eigenvector corresponding to *ν*<sup>1</sup> = 3 is given by

$$
\begin{bmatrix} 2 - \nu\_1 & 1 \\ 1 & 2 - \nu\_1 \end{bmatrix} \begin{bmatrix} x\_1 \\ x\_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \Rightarrow x\_1 = 1, \ x\_2 = 1,
$$

is one solution, the normalized eigenvector corresponding to *ν*<sup>1</sup> = 3 being *Y*<sup>1</sup> whose transpose is *Y* - 1 = √ 1 2 [1*,* 1]. Similarly, an eigenvector associated with *ν*<sup>2</sup> = 1 is *x*<sup>1</sup> = 1*, x*<sup>2</sup> = −1, which once normalized becomes *Y*<sup>2</sup> whose transpose is *Y* - 2 = √ 1 2 [1*,* −1]. Let *Λ* = diag*(*3*,* 1*)* be the diagonal matrix of the eigenvalues of *W*2. Then,

$$W\_2[Y\_1, Y\_2] = [Y\_1, Y\_2] \\ A \Rightarrow W\_2 = \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} .$$

Observe that *W*2*, W*−<sup>1</sup> <sup>2</sup> *, W* 1 2 <sup>2</sup> and *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> share the same eigenvectors *Y*<sup>1</sup> and *Y*2. Hence,

$$\begin{aligned} W\_2^{\frac{1}{2}} &= \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} \sqrt{3} & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \Rightarrow \\\ W\_2^{-\frac{1}{2}} &= \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{3}} & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix},\end{aligned}$$

and

$$\begin{split} T = W\_2^{-\frac{1}{2}} W\_1 W\_2^{-\frac{1}{2}} &= \frac{1}{4} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{3}} & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix} \\ & \times \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{3}} & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \\ &= \frac{1}{12} \begin{bmatrix} 20 & -4 \\ -4 & 20 \end{bmatrix} .\end{split}$$

The eigenvalues of *T* are <sup>1</sup> <sup>12</sup> times the solutions of *(*<sup>20</sup> <sup>−</sup> *δ)*<sup>2</sup> <sup>−</sup> 42 <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *<sup>δ</sup>*<sup>1</sup> <sup>=</sup> 24 and *<sup>δ</sup>*<sup>2</sup> <sup>=</sup> 16. Thus, the eigenvalues of *<sup>T</sup>* are <sup>24</sup> <sup>12</sup> <sup>=</sup> 2 and <sup>16</sup> <sup>12</sup> <sup>=</sup> <sup>4</sup> <sup>3</sup> *,* which are the solutions of the determinantal equation *(i)*. This verifies the result.

#### **8.2a.2. An alternative procedure in the complex case**

Let *W*˜ <sup>1</sup> and *W*˜ <sup>2</sup> be independently distributed *p* × *p* complex matrix-variate gamma random variables with the parameters *(α*1*, B), (α* ˜ <sup>2</sup>*, B),* ˜ *B*˜ = *B*˜ <sup>∗</sup> *> O, (αj ) > p* − 1*, j* = 1*,* 2. We are considering the roots *λj* 's and the corresponding eigenvectors *Yj* 's of the determinantal equation

$$\begin{split} \det(\tilde{W}\_1 - \lambda \tilde{W}\_2) = 0 &\Rightarrow \tilde{W}\_1 \tilde{Y}\_j = \lambda\_j \tilde{W}\_2 \tilde{Y}\_j, \ j = 1, \ldots, p \\ &\Rightarrow \det(\tilde{W}\_2^{-\frac{1}{2}} \tilde{W}\_1 \tilde{W}\_2^{-\frac{1}{2}} - \lambda I) = 0. \end{split} \tag{8.2a.7}$$

Let the eigenvalues *<sup>λ</sup>*1*,...,λp* of *<sup>W</sup>*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*˜ <sup>1</sup>*W*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *> O* be distinct and such that *λ*<sup>1</sup> *>* ··· *> λp >* 0, noting that for Hermitian matrices, the eigenvalues are real. We are interested in the joint distributions of the eigenvalues *λ*1*,...,λp* and the eigenvectors *Y*˜ <sup>1</sup>*,..., Y*˜ *p*. Alternatively, we will consider the equation

$$\det(\tilde{W}\_1 - \mu(\tilde{W}\_1 + \tilde{W}\_2)) = 0 \implies \det\left(\tilde{W}\_1 - \frac{\mu}{1 - \mu}\tilde{W}\_2\right) = 0, \ \lambda = \frac{\mu}{1 - \mu}.\tag{8.2a.8}$$

Proceeding as in the real case, one can observe that the joint density of *W*˜ <sup>1</sup> and *W*˜ 2, denoted by *f (*˜ *W*˜ <sup>1</sup>*, W*˜ <sup>2</sup>*)*, can be factorized into the product of the density of *U*˜<sup>1</sup> = *(W*˜ <sup>1</sup>+*W*˜ <sup>2</sup>*)* −1 <sup>2</sup>*W*˜ <sup>1</sup>*(W*˜ <sup>1</sup>+*W*˜ <sup>2</sup>*)* −1 <sup>2</sup> , denoted by *g(*˜ *U*˜1*),* which is a complex matrix-variate type-1 beta random variable with the parameters *(α*1*, α*2*)*, and the density of *U*˜<sup>3</sup> = *W*˜ <sup>1</sup> + *W*˜ 2, denoted by *g*˜1*(U*˜3*),* which is a complex matrix-variate gamma density with the parameters *(α*<sup>1</sup> + *α*2*, B),* ˜ *B>O* ˜ . That is,

$$
\tilde{f}(\tilde{W}\_1, \tilde{W}\_2) \operatorname{d}\tilde{W}\_1 \wedge \operatorname{d}\tilde{W}\_2 = \tilde{g}(\tilde{U}\_1) \tilde{g}\_1(\tilde{U}\_3) \operatorname{d}\tilde{U}\_1 \wedge \operatorname{d}\tilde{U}\_3 \tag{8.2a.9}
$$

where

$$\tilde{\mathbf{g}}(\tilde{U}\_{1}) = \frac{\bar{\Gamma}\_{p}(\alpha\_{1} + \alpha\_{2})}{\tilde{\Gamma}\_{p}(\alpha\_{1})\tilde{\Gamma}\_{p}(\alpha\_{2})} |\det(\tilde{U}\_{1})|^{\alpha\_{1} - p} |\det(I - \tilde{U}\_{1})|^{\alpha\_{2} - p} \tag{8.2a.10}$$

for *O < U*˜<sup>1</sup> *< I, (αj )>p* − 1*, j* = 1*,* 2, and *g*˜ = 0 elsewhere, and

$$\tilde{g}\_1(\tilde{U}\_3) = \frac{|\det(B)|^{a\_1 + a\_2}}{\tilde{\Gamma}\_p(\alpha\_1 + \alpha\_2)} |\det(\tilde{U}\_3)|^{a\_1 + a\_2 - p} \mathbf{e}^{-\text{tr}(\tilde{B}\,\tilde{U}\_3)}\tag{8.2a.11}$$

for *U*˜<sup>3</sup> = *U*˜ <sup>∗</sup> <sup>3</sup> *> O, (α*<sup>1</sup> + *α*2*)>p* − 1*,* and *B*˜ = *B*˜ <sup>∗</sup> *> O,* and zero elsewhere.

Note that by making the transformation *U*˜<sup>1</sup> = *QDQ*<sup>∗</sup> with *QQ*<sup>∗</sup> = *I, Q*∗*Q* = *I,* and *D* = diag*(μ*1*,...,μp),* the joint density of *μ*1*,...,μp*, as obtained from (8.2a.10), is given by

The Distributions of Eigenvalues and Eigenvectors

$$
\tilde{\mathbf{g}}\_2(D)\,\mathrm{d}D = \frac{\tilde{\Gamma}\_p(\alpha\_1 + \alpha\_2)}{\tilde{\Gamma}\_p(\alpha\_1)\tilde{\Gamma}\_p(\alpha\_2)} \Big[ \prod\_{j=1}^p \mu\_j^{\alpha\_1 - p} \Big] \Big[ \prod\_{j=1}^p (1 - \mu\_j)^{\alpha\_2 - p} \Big]
$$

$$
\times \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} \Big[ \prod\_{i$$

also refer to Note 8.2a.1. Then, we have the following result on observing that the joint density of *<sup>λ</sup>*1*,...,λp* is available from (8.2a.12) by making the substitution *<sup>λ</sup>* <sup>=</sup> *<sup>μ</sup>* 1−*μ* or *<sup>μ</sup>* <sup>=</sup> *<sup>λ</sup>* <sup>1</sup>+*λ*. Note that d*μj* <sup>=</sup> <sup>1</sup> *(*1+*λj )*<sup>2</sup> <sup>d</sup>*λj* , 1 <sup>−</sup> *μj* <sup>=</sup> <sup>1</sup> 1+*λj* , and

$$\prod\_{i$$

whose denominator contains *<sup>p</sup>* <sup>−</sup> 1 times *(*<sup>1</sup> <sup>+</sup> *λj )*<sup>2</sup> for each *<sup>j</sup>* <sup>=</sup> <sup>1</sup>*,...,p*. Thus, the final exponent of <sup>1</sup> 1+*λj* is *(α*<sup>1</sup> − *p)* + *(α*<sup>2</sup> − *p)* + 2 + 2*(p* − 1*)* = *α*<sup>1</sup> + *α*2. Hence, the following result:

**Theorem 8.2a.3.** *In the complex case, the joint density of the eigenvalues μ*<sup>1</sup> *>* ··· *> μp >* 0 *of the determinantal equation in (8.2a.8) is given by the expression appearing in (8.2a.12), which is equal to the density specified in (8.2a.5).*

We now consider the joint density of the eigenvectors *Y*˜ <sup>1</sup>*,..., Y*˜ *p*, which will be available from (8.2a.11), thus establishing that the set of eigenvalues *λ*1*,...,λp* and the eigenvectors *Y*˜ <sup>1</sup>*,..., Y*˜ *<sup>p</sup>* are independently distributed. For determining the joint density of the eigenvectors, we start with the equation

$$
\tilde{W}\_1 \tilde{Y}\_j = \mu\_j (\tilde{W}\_1 + \tilde{W}\_2) Y\_j, \ j = 1, \ldots, p,\tag{i}
$$

observing that *λj* and *μj* share the same eigenvector *Y*˜ *<sup>j</sup>* . That is,

$$
\dot{W}\_1 \dot{Y}\_i = \mu\_i (\dot{W}\_1 + \ddot{W}\_2) \dot{Y}\_i, \ i = 1, \ldots, p. \tag{ii}
$$

We continue as in the real case, showing that *Y*˜ <sup>∗</sup> *<sup>i</sup> (W*˜ <sup>1</sup> + *W*˜ <sup>2</sup>*)Y*˜ *<sup>j</sup>* = 0 for all *i* = *j* . Then, we normalize *Y*˜ *<sup>j</sup>* as follows: *Y*˜ <sup>∗</sup> *<sup>j</sup> (W*˜ <sup>1</sup>+*W*˜ <sup>2</sup>*)Y*˜ *<sup>j</sup>* = 1*, j* = 1*,...,p*. Letting *Y*˜ = *(Y*˜ <sup>1</sup>*,..., Y*˜ *p)* be the *p* × *p* matrix of the normalized eigenvectors, we have *Y*˜ <sup>∗</sup>*(W*˜ <sup>1</sup> + *W*˜ <sup>2</sup>*)Y*˜ = *I* ⇒ *U*˜<sup>1</sup> = *<sup>W</sup>*˜ <sup>1</sup> <sup>+</sup> *<sup>W</sup>*˜ <sup>2</sup> <sup>=</sup> *(Y*˜ <sup>∗</sup>*)*−1*(Y )*˜ <sup>−</sup>1. Letting *(Y )*˜ <sup>−</sup><sup>1</sup> <sup>=</sup> *<sup>Z</sup>*˜ so that *<sup>Z</sup>*˜ <sup>∗</sup>*Z*˜ <sup>=</sup> *(Y*˜*Y*˜ <sup>∗</sup>*)*−<sup>1</sup> and applying Theorem 4.2a.3, d*(Z*˜ <sup>∗</sup>*Z)*˜ <sup>=</sup> *<sup>Γ</sup>*˜ *p(p) <sup>π</sup>p(p*−1*)* d*Z*˜. Hence, given the density of *W*˜ <sup>1</sup> + *W*˜ <sup>2</sup> specified in (8.2a.11), the density of *Z*˜, denoted by *g*˜3*(Z)*˜ , is obtained as

$$\tilde{\mathbf{g}}\_3(\tilde{\mathbf{Z}}) \mathbf{d}\tilde{\mathbf{Z}} = \frac{|\det(\tilde{\mathbf{B}})|^{a\_1 + a\_2}}{\tilde{\Gamma}\_p(a\_1 + a\_2)} \frac{\tilde{\Gamma}\_p(p)}{\pi^{p(p-1)}} |\det(\tilde{\mathbf{Z}}^\* \tilde{\mathbf{Z}})|^{a\_1 + a\_2 - p} \mathbf{e}^{-\text{tr}(\tilde{\mathbf{B}} \tilde{\mathbf{Z}}^\* \tilde{\mathbf{Z}})} \mathbf{d}\tilde{\mathbf{Z}}.\tag{8.2a.13}$$

Now noting that *<sup>Z</sup>*˜ <sup>=</sup> *<sup>Y</sup>*˜ <sup>−</sup><sup>1</sup> <sup>⇒</sup> <sup>d</sup>*Z*˜ = |det*(Y*˜ <sup>∗</sup>*Y )*˜ <sup>|</sup> <sup>−</sup>*p*d*Y*˜ from an application of Theorem 1.6a.6, and substituting in (8.2a.13), we obtain the following density of *Y*˜, denoted by *g*˜4*(Y )*˜ :

$$\tilde{\mathbf{g}}\_4(\tilde{Y})\mathbf{d}\tilde{Y} = \frac{|\det(\tilde{B})|^{\alpha\_1 + \alpha\_2}}{\tilde{\Gamma}\_p(\alpha\_1 + \alpha\_2)} \frac{\tilde{\Gamma}\_p(p)}{\pi^{p(p-1)}} |\det(\tilde{Y}\tilde{Y}^\*)|^{-\alpha\_1 - \alpha\_2} \mathbf{c}^{-\text{tr}(\tilde{B}(\tilde{Y}^\*)^{-1}\tilde{Y}^{-1})} \mathbf{d}\tilde{Y} \qquad (8.2a.14)$$

for *B*˜ = *B*˜ <sup>∗</sup> *> O, (α*<sup>1</sup> + *α*2*)>p* − 1, and zero elsewhere.

**Example 8.2a.5.** Show that the roots of the determinantal equation det*(W*˜ <sup>1</sup> − *λW*˜ <sup>2</sup>*)* = 0 are the same as the eigenvalues of *<sup>W</sup>*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*˜ <sup>1</sup>*W*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> for the following Hermitian positive definite matrices:

$$
\tilde{W}\_1 = \begin{bmatrix} 3 & 1-i \\ 1+i & 3 \end{bmatrix}, \quad \tilde{W}\_2 = \begin{bmatrix} 3 & \sqrt{2}(1+i) \\ \sqrt{2}(1-i) & 3 \end{bmatrix}.
$$

**Solution 8.2a.5.** Let us evaluate the eigenvalues and eigenvectors of *W*˜ 2. Consider the equation det*(W*˜ <sup>2</sup> <sup>−</sup> *μI )* <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *(*<sup>3</sup> <sup>−</sup> *μ)*<sup>2</sup> <sup>−</sup> 22 <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *<sup>μ</sup>*<sup>1</sup> <sup>=</sup> 5 and *<sup>μ</sup>*<sup>2</sup> <sup>=</sup> 1 are the eigenvalues of *W*˜ 2. An eigenvector corresponding to *μ*<sup>1</sup> = 5 must satisfy the equation

$$
\begin{bmatrix} 3 - 5 & \sqrt{2}(1 + i) \\ \sqrt{2}(1 - i) & 3 - 5 \end{bmatrix} \begin{bmatrix} x\_1 \\ x\_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}
\Rightarrow \begin{array}{c} -2x\_1 + \sqrt{2}(1 + i)x\_2 = 0 \\ \sqrt{2}(1 - i)x\_1 - 2x\_2 = 0 \end{array} .
$$

Since it is a singular system of linear equations, we can solve any one of them for *x*<sup>1</sup> and *x*2. For *x*<sup>2</sup> = 1, we have *x*<sup>1</sup> = <sup>√</sup> 1 2 *(*1 + *i)*. Thus, one eigenvector is

$$
\tilde{X}\_1 = \begin{bmatrix} \frac{1}{\sqrt{2}}(1+i) \\ 1 \end{bmatrix} \Rightarrow \tilde{X}\_1^\* \tilde{X}\_1 = 2 \Rightarrow \tilde{Y}\_1 = \frac{1}{\sqrt{2}} \begin{bmatrix} \frac{1}{\sqrt{2}}(1+i) \\ 1 \end{bmatrix}
$$

where *Y*˜ <sup>1</sup> is the normalized eigenvector obtained from *X*˜ 1. Similarly, corresponding to the eigenvalue *μ*<sup>2</sup> = 1, we have the normalized eigenvector

$$
\tilde{Y}\_2 = \frac{1}{\sqrt{2}} \begin{bmatrix} -\frac{1}{\sqrt{2}}(1+i) \\ 1 \end{bmatrix} \text{ and } \tilde{W}\_2[\tilde{Y}\_1, \tilde{Y}\_2] = [\tilde{Y}\_1, \tilde{Y}\_2] \begin{bmatrix} 5 & 0 \\ 0 & 1 \end{bmatrix},
$$

so that

$$
\tilde{W}\_2 = \frac{1}{2} \begin{bmatrix}
\frac{1}{\sqrt{2}}(1+i) & -\frac{1}{\sqrt{2}}(1+i) \\
1 & 1
\end{bmatrix} \begin{bmatrix}
5 & 0 \\
0 & 1
\end{bmatrix} \begin{bmatrix}
\frac{1}{\sqrt{2}}(1-i) & 1 \\
\end{bmatrix},
$$

#### The Distributions of Eigenvalues and Eigenvectors

observing that the above format is *W*˜ <sup>2</sup> = *Y D*˜ *Y*˜ <sup>∗</sup> with *Y*˜ = [*Y*˜ <sup>1</sup>*, Y*˜ <sup>2</sup>] and *D* = diag*(*5*,* 1*)*, the diagonal matrix of the eigenvalues of *W*˜ 2. Since *W*˜ <sup>2</sup>*, W*˜ 1 2 <sup>2</sup> *, <sup>W</sup>*˜ <sup>−</sup><sup>1</sup> <sup>2</sup> and *<sup>W</sup>*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> share the same eigenvectors, we have

$$
\begin{split}
\bar{W}\_{2}^{\frac{1}{2}} &= \frac{1}{2} \begin{bmatrix}
\frac{1}{\sqrt{2}}(1+i) & -\frac{1}{\sqrt{2}}(1+i) \\
1 & 1
\end{bmatrix} \begin{bmatrix}
\sqrt{5} & 0 \\
0 & 1
\end{bmatrix} \begin{bmatrix}
\frac{1}{\sqrt{2}}(1-i) & 1 \\
\end{bmatrix} \Rightarrow\\ 
\bar{W}\_{2}^{-\frac{1}{2}} &= \frac{1}{2} \begin{bmatrix}
\frac{1}{\sqrt{2}}(1+i) & -\frac{1}{\sqrt{2}}(1+i) \\
1 & 1
\end{bmatrix} \begin{bmatrix}
\frac{1}{\sqrt{5}} & 0 \\
0 & 1
\end{bmatrix} \begin{bmatrix}
\frac{1}{\sqrt{2}}(1-i) & 1 \\
\end{bmatrix} \\ &= \frac{1}{2} \begin{bmatrix}
\frac{1}{\sqrt{5}}+1 & (\frac{1}{\sqrt{5}}-1)\frac{1}{\sqrt{2}}(1+i) \\
(\frac{1}{\sqrt{5}}-1)\frac{1}{\sqrt{2}}(1-i) & \frac{1}{\sqrt{5}}+1
\end{bmatrix}.
\tag{i)}.
\tag{1}
$$

It is easily verified that

$$\begin{split} \tilde{W}\_{2}^{-\frac{1}{2}} \tilde{W}\_{2}^{-\frac{1}{2}} &= \frac{1}{4} \begin{bmatrix} \frac{1}{\sqrt{5}} + 1 & (\frac{1}{\sqrt{5}} - 1)\frac{1}{\sqrt{2}}(1 + i) \\ (\frac{1}{\sqrt{5}} - 1)\frac{1}{\sqrt{2}}(1 - i) & \frac{1}{\sqrt{5}} + 1 \end{bmatrix} \\ & \times \begin{bmatrix} \frac{1}{\sqrt{5}} + 1 & (\frac{1}{\sqrt{5}} - 1)\frac{1}{\sqrt{2}}(1 + i) \\ (\frac{1}{\sqrt{5}} - 1)\frac{1}{\sqrt{2}}(1 - i) & \frac{1}{\sqrt{5}} + 1 \end{bmatrix} \\ &= \frac{1}{5} \begin{bmatrix} 3 & -\sqrt{2}(1 + i) \\ -\sqrt{2}(1 - i) & 3 \end{bmatrix} = \tilde{W}\_{2}^{-1} . \end{split}$$

Letting *<sup>Q</sup>* <sup>=</sup> *<sup>W</sup>*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*˜ <sup>1</sup>*W*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> , we have

$$\begin{split} \mathcal{Q} &= \frac{1}{4} \begin{bmatrix} \frac{1}{\sqrt{5}} + 1 & (\frac{1}{\sqrt{5}} - 1)\frac{1}{\sqrt{2}}(1 + i) \\ (\frac{1}{\sqrt{5}} - 1)\frac{1}{\sqrt{2}}(1 - i) & \frac{1}{\sqrt{5}} + 1 \end{bmatrix} \begin{bmatrix} 3 & 1 - i \\ 1 + i & 3 \end{bmatrix} \\ & \times \begin{bmatrix} \frac{1}{\sqrt{5}} + 1 & (\frac{1}{\sqrt{5}} - 1)\frac{1}{\sqrt{2}}(1 + i) \\ (\frac{1}{\sqrt{5}} - 1)\frac{1}{\sqrt{2}}(1 - i) & \frac{1}{\sqrt{5}} + 1 \end{bmatrix} \\ &= \frac{1}{4} \begin{bmatrix} 6^2 & -\frac{12}{5}\sqrt{2}(1 + i) + \frac{4}{\sqrt{5}}(1 - i) \\ -\frac{12}{5}\sqrt{2}(1 - i) + \frac{4}{\sqrt{5}}(1 + i) & \frac{6^2}{5} \end{bmatrix}. \end{split} \tag{ii}$$

The eigenvalues of 4*Q* can be determined by solving the equation

$$
\begin{aligned}
\left(\frac{6^2}{5} - \nu\right)^2 - \left[ -\frac{12}{5}\sqrt{2}(1+i) + \frac{4}{\sqrt{5}}(1-i) \right] \left[ -\frac{12}{5}\sqrt{2}(1+i) + \frac{4}{\sqrt{5}}(1-i) \right] &= 0 \Rightarrow \\
\left(\frac{6^2}{5} - \nu\right)^2 - \frac{4^2}{5^2}(46) &= 0 \Rightarrow \\
\nu &= \frac{4}{5}(9 \pm \sqrt{46}). \end{aligned}
\tag{iii}
$$

Thus, the eigenvalues of *Q*, denoted by *δ*, are

$$
\delta = \frac{1}{\mathfrak{S}} (\mathfrak{g} \pm \sqrt{4\mathfrak{k}}).\tag{i\nu}
$$

Now, let us consider the determinantal equation det*(W*˜ <sup>1</sup> − *λW*˜ <sup>2</sup>*)* = 0, that is,

$$
\begin{vmatrix}
\mathfrak{z}(1-\lambda) & (1-i)-\lambda\sqrt{2}(1+i) \\
(1+i)-\lambda\sqrt{2}(1-i) & \mathfrak{z}(1-\lambda)
\end{vmatrix} = 0,
$$

which yields

$$3^2(1-\lambda)^2 - [(1+i) - \lambda\sqrt{2}(1-i)][(1-i) - \lambda\sqrt{2}(1+i)] = 0 \Rightarrow$$

$$3^2(1-\lambda)^2 - [(1-\sqrt{2}\lambda)^2 + (1+\sqrt{2}\lambda)^2] = 0 \Rightarrow$$

$$\lambda = \frac{1}{5}(9 \pm \sqrt{46}). \tag{v}$$

The eigenvalues obtained in *(iv)* and *(v)* being identical, the result is established.

#### **8.3. The Singular Real Case**

If a *p* × *p* real matrix-variate gamma distribution with parameters *(α, β)* and *B>O* is singular and positive semi-definite, its *<sup>p</sup>* <sup>×</sup> *<sup>p</sup>*-variate density does not exist. When *<sup>α</sup>* <sup>=</sup> *<sup>m</sup>* <sup>2</sup> *, m* <sup>≥</sup> *p,* and *<sup>B</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup>*Σ*−1*, Σ > O,* the gamma density is called a Wishart density with *m* degrees of freedom and parameter matrix *Σ>O*. If the rank of the gamma or Wishart matrices is *r < p,* in which case they are positive semi-definite, the resulting distributions are said to be singular. It can be shown that, in this instance, we have in fact nonsingular *r* × *r*-variate gamma or Wishart distributions. In order to establish this, the matrix theory results presented next are required.

Let *A* = *A*- ≥ *O* (non-negative definite) be a *p* × *p* real matrix of rank *r<p*, and the elementary matrices *E*1*, E*2*,...,Ek* be such that by operating on *A*, one has

$$E\_k \cdots E\_2 E\_1 A E\_1' E\_2' \cdots E\_k' = \begin{bmatrix} I\_r & O\_1 \\ O\_2 & O\_3 \end{bmatrix}$$

where *O*1*, O*<sup>2</sup> and *O*<sup>3</sup> are null matrices, with *O*<sup>3</sup> being of order *(p* − *r)* × *(p* − *r)*. Then,

$$A = E\_1^{-1} \cdots E\_k^{-1} \begin{bmatrix} I\_r & O\_1 \\ O\_2 & O\_3 \end{bmatrix} E\_k'^{-1} \cdots E\_1'^{-1} = \mathcal{Q} \begin{bmatrix} I\_r & O\_1 \\ O\_2 & O\_3 \end{bmatrix} \mathcal{Q}'^{-1}$$

where *Q* is a product of inverses of elementary matrices and hence, nonsingular. Letting

$$\begin{aligned} \mathcal{Q} &= \begin{bmatrix} \mathcal{Q}\_{11} & \mathcal{Q}\_{12} \\ \mathcal{Q}\_{21} & \mathcal{Q}\_{22} \end{bmatrix}, \quad \mathcal{Q}\_{11} \text{ being } r \times r \text{ and } \mathcal{Q}\_{22}, (p-r) \times (p-r), \\\mathcal{Q} \begin{bmatrix} I\_r & O\_1 \\ O\_2 & O\_3 \end{bmatrix} &= \begin{bmatrix} \mathcal{Q}\_{11} & \mathcal{Q}\_{12} \\ \mathcal{Q}\_{21} & \mathcal{Q}\_{22} \end{bmatrix} \begin{bmatrix} I\_r & O \\ O & O \end{bmatrix} = \begin{bmatrix} \mathcal{Q}\_{11} & O \\ \mathcal{Q}\_{21} & O \end{bmatrix}. \end{aligned}$$

Note that

$$\mathcal{Q}\begin{bmatrix} I\_r & O \\ O & O \end{bmatrix} \mathcal{Q}' = \begin{bmatrix} \mathcal{Q}\_{11}\mathcal{Q}'\_{11} & \mathcal{Q}\_{11}\mathcal{Q}'\_{21} \\ \mathcal{Q}\_{21}\mathcal{Q}'\_{11} & \mathcal{Q}\_{21}\mathcal{Q}'\_{21} \end{bmatrix} = \begin{bmatrix} \mathcal{Q}\_{11} \\ \mathcal{Q}\_{21} \end{bmatrix} [\mathcal{Q}'\_{11}, \mathcal{Q}'\_{21}] = A\_1 A'\_1$$

where *A*<sup>1</sup> = *Q*<sup>11</sup> *Q*21 which is a full rank *p* × *r* matrix, *r<p*, so that the *r* columns of *A*<sup>1</sup> are all linearly independent. This result can also be established by appealing to the fact that when *A* ≥ *O*, its eigenvalues are non-negative, and *A* being symmetric, there exists an orthonormal matrix *P, PP*- = *I, P*- *P* = *I,* such that

$$A = P \begin{bmatrix} D & O \\ O & O \end{bmatrix} P' = \lambda\_1 P\_1 P\_1' + \dots + \lambda\_r P\_1 P\_r' + 0 P\_{r+1} P\_{r+1}' + \dots + 0 P\_p P\_p' = P\_{(1)} D P\_{(1)}',$$

$$D = \text{diag}(\lambda\_1, \dots, \lambda\_r), \ P\_{(1)} = [P\_1, \dots, P\_r], \ P = [P\_1, \dots, P\_r, \dots, P\_p],$$

where *P*1*,...,Pp* are the columns of the orthonormal matrix *P*, *λ*1*,...,λr,* 0*,...,* 0 are the eigenvalues of *A* where *λj >* 0*, j* = 1*, . . . , r,* and *P(*1*)* contains the first *r* columns of *P*. The first *r* eigenvalues must be positive since *A* is a non-negative definite matrix of rank *r*. Now, we can write *A* = *P(*1*)DP*- *(*1*)* = *A*1*A*- <sup>1</sup>*,* with *A*<sup>1</sup> = *P(*1*)D* 1 <sup>2</sup> . Observe that *A*<sup>1</sup> is *p* × *r* and of rank *r<p*. Thus, we have the following result:

**Theorem 8.3.1.** *Let A* = *A be a real p*×*p positive semi-definite matrix, A* ≥ *O, of rank r<p. Then, A can be represented in the form A* = *A*1*A*- <sup>1</sup> *where A*<sup>1</sup> *is a p* × *r, r < p, matrix of rank r or, equivalently, the r columns of the p* × *r matrix A*<sup>1</sup> *are all linearly independent.*

In the case of Wishart matrices, we can interpret Theorem 8.3.1 as follows: Let the *p*×1 vector random variable *Xj* have a nonsingular Gaussian distribution whose mean value is the null vector and covariance matrix *Σ* is positive definite. Let the *Xj* 's, *j* = 1*, . . . , n,* be independently distributed, that is, *Xj iid* ∼ *Np(O, Σ), Σ > O, j* = 1*,...,n*. Letting the *p* × *n* sample matrix be **X** = [*X*1*,...,Xn*], the joint density of *X*1*,...,Xn* or that of **X**, denoted by *f (***X***)*, is the following:

$$\begin{split} f(\mathbf{X})\mathbf{dX} &= \frac{1}{(2\pi)^{\frac{np}{2}} |\boldsymbol{\Sigma}|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2}\sum\_{j=1}^{n} X\_{j}^{\prime} \boldsymbol{\Sigma}^{-1} \boldsymbol{X}\_{j}} \mathbf{dX} \\ &= \frac{1}{(2\pi)^{\frac{np}{2}} |\boldsymbol{\Sigma}|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2}\text{tr}(\boldsymbol{\Sigma}^{-1}\mathbf{X}\mathbf{X}^{\prime})} \mathbf{dX} . \end{split} \tag{8.3.1}$$

Letting *W* = **XX**- , the *p*×*p* matrix *W* will be positive definite provided *n* ≥ *p*; otherwise, that is when *n<p*, *W* will be singular. Let us consider the case *n* ≥ *p* first. This will also provide a derivation of the real Wishart density which was earlier obtained as a special case of real matrix-variate gamma density. Observe that we can write d**X** in terms of d*W* by applying Theorem 4.2.3, namely,

$$\mathbf{dX} = \frac{\pi^{\frac{np}{2}}}{\Gamma\_P(\frac{n}{2})} |W|^{\frac{n}{2} - \frac{p+1}{2}} \mathbf{d}W.$$

Therefore, if the density of *W* is denoted by *f*1*(W )*, then *f*1*(W )* is available from (8.3.1) by expressing d**X** in terms of d*W*. That is,

$$f\_{\mathbf{l}}(W) = \frac{1}{2^{\frac{np}{2}} \Gamma\_p(\frac{n}{2}) |\boldsymbol{\Sigma}|^{\frac{n}{2}}} |W|^{\frac{n}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\boldsymbol{\Sigma}^{-1}W)} \tag{8.3.2}$$

for *n* ≥ *p, W > O, Σ > O,* and *f*1*(W )* = 0 elsewhere. This is the density of a nonsingular Wishart distribution with *n* degrees of freedom, *n* ≥ *p*, and parameter matrix *Σ>O*, which is denoted *W* ∼ *Wp(n, Σ), Σ > O, n* ≥ *p*. It has previously been shown that when *Xj iid* ∼ *Np(μ, Σ), j* = 1*,...,n*, where *μ* = *O* is the common *p* × 1 mean value vector and *Σ* is the positive definite covariance matrix,

$$W = (\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})' \sim W\_p(n - 1, \,\, \Sigma), \,\, \Sigma > O \text{ for } n - 1 \ge p,$$

where **<sup>X</sup>**¯ = [*X,... ,* ¯ *<sup>X</sup>*¯ ]*, <sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn)*. Thus, we have the following result:

**Theorem 8.3.2.** *Let Xj iid* ∼ *Np(μ, Σ), Σ > O, j* = 1*,...,n. Let* **X** = [*X*1*,...,Xn*] *and W* = **XX**- *. Then, when μ* = *O, the p* × *p positive definite matrix W* ∼ *Wp(n, Σ), Σ > O for n* ≥ *p. If μ* = *O, W* = *(***X**−**X**¯ *)(***X**−**X**¯ *)*- ∼ *Wp(n*−1*, Σ), Σ > O, for <sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>≥</sup> *<sup>p</sup>, where* **<sup>X</sup>**¯ = [*X,... ,* ¯ *<sup>X</sup>*¯ ]*, <sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*<sup>1</sup> +···+ *Xn).*

Now, consider the case *n<p*. Let us denote *n* as *r<p* in order to avoid any confusion with *n* as specified in the nonsingular case. Letting **X** be as previously defined, **X** is a real matrix of order *p* × *r*, *n* = *r<p*. Let *T*<sup>1</sup> = **XX** and *T*<sup>2</sup> = **X**- **X** where **X**- **X** is an *r* × *r* positive definite matrix since **X** and **X** are full rank matrices of rank *r<p*. Thus, all the eigenvalues of *T*<sup>2</sup> = **X**- **X** are positive and the eigenvalues of *T*<sup>1</sup> are either positive or equal to zero since *T*<sup>1</sup> is a real positive semi-definite matrix. Letting *λ* be a nonzero eigenvalue of *T*2, consider the following determinant, denoted by *δ*, which is expanded in two ways by making use of certain properties of the determinants of partitioned matrices that are provided in Sect. 1.3:

$$\begin{split} \delta &= \begin{vmatrix} \sqrt{\lambda}I\_p & \mathbf{X} \\ \mathbf{X}' & \sqrt{\lambda}I\_r \end{vmatrix} = |\sqrt{\lambda}I\_p| \, |\sqrt{\lambda}I\_r - \mathbf{X}' (\sqrt{\lambda}I\_p)^{-1} \mathbf{X}| \\ &= (\sqrt{\lambda})^{p-r} |\lambda I\_r - \mathbf{X}' \mathbf{X}|; \\ \delta &= 0 \Rightarrow |\lambda I\_r - \mathbf{X}' \mathbf{X}| = 0, \end{split} \tag{8.3.3}$$

which shows that *λ* is an eigenvalue of **X**- **X**. Now expand *δ* as follows:

$$\begin{aligned} \delta &= |\sqrt{\lambda}I\_r| \, |\sqrt{\lambda}I\_p - \mathbf{X} (\sqrt{\lambda}I\_r)^{-1} \mathbf{X}'|\\ &= (\sqrt{\lambda})^{-p+r} |\lambda I\_p - \mathbf{X} \mathbf{X}'|;\\ \delta &= 0 \Rightarrow |\lambda I\_p - \mathbf{X} \mathbf{X}'| = 0, \end{aligned} \tag{8.3.4}$$

so that all the *r* nonzero eigenvalues of *T*<sup>2</sup> = **X**- **X** are also eigenvalues of *T*<sup>1</sup> = **XX**- , the remaining eigenvalues of *T*<sup>1</sup> being zeros. As well, one has |*Ir* −**X**- **X**|=|*Ip* −**XX**- |*.* These results are next stated as a theorem.

**Theorem 8.3.3.** *Let* **X** *be a p* × *r matrix of full rank r<p. Let the real p* × *p positive semi-definite matrix T*<sup>1</sup> = **XX** *and the r* × *r real positive definite matrix T*<sup>2</sup> = **X**- **X***. Then, (a) the r positive eigenvalues of T*<sup>2</sup> *are identical to those of T*1*, the remaining eigenvalues of T*<sup>1</sup> *being equal to zero; (b)* |*Ir* − **X**- **X**|=|*Ip* − **XX**- |*.*

Additional results relating the *p*-variate real Gaussian distribution to the real Wishart distribution are needed in connection with the singular case. Let the *p* × 1 vector *Xj* have a *p*-variate real Gaussian distribution whose mean value is the null vector and covariance matrix is positive definite, with *Xj iid* ∼ *Np(O, Σ), Σ > O*, *j* = 1*, . . . , r, r < p*. Let **X** = [*X*1*,...,Xr*] be a *p* × *r* matrix, which, in this instance, is also the sample matrix. We are seeking the distribution of *T*<sup>1</sup> = **XX** when *r<p*, where *T*<sup>1</sup> corresponds to a singular Wishart matrix. Letting *T* be an *r* × *r* lower triangular matrix with positive diagonal elements, and *G* be an *r* ×*p, r < p,* semiorthonormal matrix, that is, *GG*- = *Ir*, we have the representation **X**-= *T G*, so that

$$\mathbf{dX}' = \left\{ \prod\_{j=1}^{r} t\_{jj}^{p-j} \right\} \mathbf{d}T \ h(G), \ p \ge r,\tag{8.3.5}$$

where *h(G)* is a differential element associated with *G*. Then, on applying Theorem 4.2.2, we have

$$\int\_{V\_{r,p}} h(G) = 2^r \frac{\pi^{\frac{pr}{2}}}{\Gamma\_r(\frac{p}{2})}, \ p \ge r,\tag{8.3.6}$$

where *Vr,p* is the Stiefel manifold or the space of semi-orthonormal *r*×*p*, *r < p,* matrices. Observe that the density of **X**- , denoted by *f***X**-*(***X**- *)*, is the following:

$$f\_{\mathbf{X'}}(\mathbf{X'})\,\mathrm{d}\mathbf{X'} = \frac{\mathrm{e}^{-\frac{1}{2}\sum\_{j=1}^{r}X\_{j}^{\prime}\Sigma^{-1}X\_{j}}}{(2\pi)^{\frac{pr}{2}}|\Sigma|^{\frac{r}{2}}}\,\mathrm{d}\mathbf{X'} = \frac{\mathrm{e}^{-\frac{1}{2}\mathrm{tr}(\mathbf{X'}\Sigma^{-1}\mathbf{X})}}{(2\pi)^{\frac{pr}{2}}|\Sigma|^{\frac{r}{2}}}\,\mathrm{d}\mathbf{X'}.$$

Let *T*<sup>2</sup> = **X**- *<sup>Σ</sup>*−1**<sup>X</sup>** or simply *<sup>T</sup>*<sup>2</sup> <sup>=</sup> **<sup>X</sup>**- **X** when *Σ* = *I* ; note that *Σ* will vanish upon letting **<sup>Y</sup>** <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup>**X** ⇒ d**Y** = |*Σ*| −*r* <sup>2</sup> d**X**. Now, on expressing d**X** in terms of d*T*<sup>2</sup> by making use of Theorem 4.2.3, the following result is obtained:

**Theorem 8.3.4.** *Let Xj iid* ∼ *Np(μ, Σ), Σ > O, j* = 1*, . . . , r, r < p. Let* **X** = [*X*1*,...,Xr*]*, <sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>r</sup> (X*<sup>1</sup> +···+ *Xr) and* **X**¯ = *(X,... ,* ¯ *X)*¯ *. Letting T*<sup>2</sup> = **X**- *Σ*−1**X** *or T*<sup>2</sup> = **X**- **X** *when Σ* = *I , T*<sup>2</sup> *has the following density, denoted by ft(T*2*), when μ* = *O:*

$$f\_l(T\_2) = \frac{1}{2^{\frac{pr}{2}}\Gamma\_r(\frac{p}{2})} |T\_2|^{\frac{p}{2} - \frac{r+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(T\_2)}, \ T\_2 > O, \ r \le p,\tag{8.3.7}$$

*and zero elsewhere. Note that the r* × *r matrix T*<sup>2</sup> = **X**- **X** *or T*<sup>2</sup> = **X**- *<sup>Σ</sup>*−1**<sup>X</sup>** *when <sup>Σ</sup>* = *I, has a Wishart distribution with p degrees of freedom and parameter matrix I , that is, T*<sup>2</sup> ∼ *Wr(p, I ), r* ≤ *p. When μ* = *O, T*<sup>2</sup> = *(***X** − **X**¯ *)*- *<sup>Σ</sup>*−1*(***<sup>X</sup>** <sup>−</sup> **<sup>X</sup>**¯ *) has a Wishart distribution with p*−1 *degrees of freedom or, equivalently, T*<sup>2</sup> ∼ *Wr(p*−1*, I ), r* ≤ *p*−1*.*

#### **8.3.1. Singular Wishart and matrix-variate gamma distributions, real case**

We now consider the case of a singular matrix-variate gamma distribution. Let the *p* × 1 vector *Xj* have a *p*-variate real Gaussian distribution whose mean value is the null vector and covariance matrix is positive definite, with *Xj iid* ∼ *Np(O, Σ), Σ > O*, *j* = 1*, . . . , r, r < p*. For convenience, let *Σ* = *Ip*. Let **X** = [*X*1*,...,Xr*] be the *p* ×*r* sample matrix. Then, for *r* ≥ *p,* **XX** is distributed as a Wishart matrix with *r* ≥ *p* degrees of freedom, that is, **XX**- ∼ *Wp(r, I ), r* ≥ *p*. This result still holds for any positive definite matrix *Σ*; it suffices then to replace *I* by *Σ*. What about the distribution of **XX**if *r<p*,

which corresponds to the singular case? In this instance, the real matrix **XX**- ≥ *O* (positive semi-definite) and the density of **X**, denoted by *f*1*(***X***)*, is the following:

$$f\_1(\mathbf{X}) \, \mathbf{dX} = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\mathbf{X}\mathbf{X}')}}{(2\pi)^{\frac{rp}{2}}} \text{d}\mathbf{X}.\tag{8.3.8}$$

Let *W*<sup>2</sup> be a *p* × *p* nonsingular Wishart matrix with *n* ≥ *p* degrees of freedom, that is, *W*<sup>2</sup> ∼ *Wp(n, I )*. Let **X** and *W*<sup>2</sup> be independently distributed. Then, the joint density of **X** and *W*2, denoted by *f*2*(***X***, W*2*)*, is given by

$$f\_2(\mathbf{X}, \ W\_2) = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\mathbf{X}\mathbf{X}' + W\_2)}}{(2\pi)^{\frac{pr}{2}}2^{\frac{np}{2}}\Gamma\_p(\frac{n}{2})}|W\_2|^{\frac{n}{2}-\frac{p+1}{2}}$$

for *W*<sup>2</sup> *> O,* **XX**- ≥ *O, n* ≥ *p, r < p*. Letting *U* = **XX**- + *W*<sup>2</sup> *> O*, and the joint density of *U* and **X** be denoted by *f*3*(***X***,U)*, we have

$$f\_{\mathfrak{I}}(\mathbf{X}, \ U) = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(U)}}{(2\pi)^{\frac{pr}{2}}2^{\frac{np}{2}}\Gamma\_p(\frac{n}{2})} |U - \mathbf{X}\mathbf{X}'|^{\frac{n}{2} - \frac{p+1}{2}}, \ n \ge p, \ r < p, \ \mathbf{y}$$

where

$$|U - \mathbf{X}\mathbf{X}'| = |U|\,|I - U^{-\frac{1}{2}}\mathbf{X}\mathbf{X}'U^{-\frac{1}{2}}\|.$$

Letting *<sup>V</sup>* <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> <sup>2</sup>**X** for fixed *U*, d*V* = |*U*| *r* <sup>2</sup> d**X**, and the joint density of *U* and *V* , denoted by *f*4*(U, V )*, is then

$$f\_4(U, V) = \frac{|U|^{\frac{n+r}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(U)}}{(2\pi)^{\frac{pr}{2}} 2^{\frac{n}{2}} \Gamma\_p(\frac{n}{2})} |I - VV'|^{\frac{n}{2} - \frac{p+1}{2}}.\tag{8.3.9}$$

Note that *U* and *V* are independently distributed. By integrating out *U* with the help of a real matrix-variate gamma integral, we obtain the density of *V* , denoted by *f*5*(V )*, as

$$f\_{\mathfrak{H}}(V) = c \left| I - VV' \right|^{\frac{n}{2} - \frac{p+1}{2}} = c \left| I - V'V \right|^{\frac{n}{2} - \frac{p+1}{2}},\tag{8.3.10}$$

in view of Theorem 8.3.3(b), where *V* is *p* × *r, r < p*, *c* being the normalizing constant. Thus, we have the following result:

**Theorem 8.3.5.** *Let* **X** = [*X*1*,...,Xr*] *and Xj iid* ∼ *Np(O, I ), j* = 1*, . . . , r, r < p. Let the p* × *p real positive definite matrix W*<sup>2</sup> *be Wishart distributed with n degrees of freedom, that is, W*<sup>2</sup> ∼ *Wp(n, I ). Let U* = **XX**- <sup>+</sup> *<sup>W</sup>*<sup>2</sup> *> O and let <sup>V</sup>* <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> <sup>2</sup>**X** *where V*

*is a p* × *r, r < p, matrix of full rank r. Observe that V V* - ≥ *O (positive semi-definite). Then, the densities of the r* × *p matrix V and the matrix S* = *V* - *V , respectively denoted by f*5*(V ) and f*6*(S), are as follows, U and V being independently distributed:*

$$f\_{\mathfrak{H}}(V) = \frac{\Gamma\_r(\frac{p}{2})}{\pi^{\frac{pr}{2}}} \frac{\Gamma\_r(\frac{n+r}{2})}{\Gamma\_r(\frac{p}{2})\Gamma\_r(\frac{n-p+r}{2})} |I - V'V|^{\frac{n}{2} - \frac{p+1}{2}},\tag{8.3.11}$$

*and*

$$f\_{\mathsf{f}}(\mathsf{S}) = \frac{\Gamma\_{r}(\frac{n+r}{2})}{\Gamma\_{r}(\frac{p}{2})\Gamma\_{r}(\frac{n-p+r}{2})} |\mathsf{S}|^{\frac{p}{2} - \frac{r+1}{2}} |I - \mathsf{S}|^{\frac{n-p+r}{2} - \frac{r+1}{2}}, \; n \ge p, \; r < p, \; \mathsf{S} > \mathsf{O}. \; (\mathsf{8.3.12})$$

*Proof***:** In light of Theorem 8.3.3(b), |*Ip* − *V V* - |=|*Ir* − *V* - *V* |. Observe that *V V* is *p* × *p* and positive semi-definite whereas *V* - *<sup>V</sup>* is *<sup>r</sup>* <sup>×</sup> *<sup>r</sup>* and positive definite. As well, *<sup>n</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> <sup>=</sup> *<sup>n</sup>*−*p*+*<sup>r</sup>* <sup>2</sup> <sup>−</sup> *<sup>r</sup>*+<sup>1</sup> <sup>2</sup> . Letting *S* = *V* - *V* and expressing d*V* in terms of d*S* by applying Theorem 4.2.3, then for *r < p,*

$$\mathrm{d}V' = \frac{\pi^{\frac{rp}{2}}}{\Gamma\_r(\frac{p}{2})} |S|^{\frac{p}{2} - \frac{r+1}{2}} \mathrm{d}S. \tag{i}$$

Now, integrating out *S* by making use of a type-1 beta integral, we have

$$\int\_{S>O} |S|^{\frac{p}{2}-\frac{r+1}{2}} |I-S|^{\frac{n-p+1}{2}-\frac{r+1}{2}} \mathrm{d}S = \frac{\Gamma\_r(\frac{p}{2})\Gamma\_r(\frac{n-p+r}{2})}{\Gamma\_r(\frac{n+r}{2})} \tag{ii}$$

for *n* ≥ *p, r < p*. The normalizing constants in (8.3.11) and (8.3.12) follow from *(i)* and *(ii)* . This completes the proofs.

At this juncture, we are considering the singular version of the determinantal equation in (8.2.8). Let *W*<sup>1</sup> and *W*<sup>2</sup> be independently distributed *p* × *p* real matrices where *W*<sup>1</sup> = **XX**- *,* **X** = [*X*1*,...,Xr*] with *Xj iid* ∼ *Np(O, I ), j* = 1*, . . . , r, r < p*, and the positive definite matrix *W*<sup>2</sup> ∼ *Wp(n, I )*, *n* ≥ *p*. The equation

$$\begin{aligned} |W\_1 - \mu(W\_1 + W\_2)| &= O \Rightarrow |\mathbf{X}\mathbf{X}' - \mu(\mathbf{X}\mathbf{X}' + W\_2)| &= 0\\ &\Rightarrow |U^{-\frac{1}{2}}\mathbf{X}\mathbf{X}'U^{-\frac{1}{2}} - \mu I\_p| &= 0, \ U = \mathbf{X}\mathbf{X}' + W\_2 > O\\ &\Rightarrow |VV' - \mu I\_p| = 0, \end{aligned}$$

which, in turn, implies that *μ* is an eigenvalue of *V V* - ≥ *O* and all the eigenvalues are positive or zero. However, it follows from (8.3.3) and (8.3.4) that

$$|VV' - \mu I\_p| = 0 \Rightarrow |V'V - \mu I\_r| = 0.$$

Hence, the following result:

**Theorem 8.3.6.** *Let* **X** *be a p*×*r matrix whose columns are iid Np(O, I ) and r<p. Let W*<sup>1</sup> = **XX**- ≥ *O which is a p*×*p positive semi-definite matrix, W*<sup>2</sup> *> O be a p*×*p Wishart distributed matrix having n degrees of freedom, that is, W*<sup>2</sup> ∼ *Wp(n, I ), n* ≥ *p, and let <sup>W</sup>*<sup>1</sup> *and <sup>W</sup>*<sup>2</sup> *be independently distributed. Then, <sup>U</sup>* <sup>=</sup> *<sup>W</sup>*<sup>1</sup> <sup>+</sup> *<sup>W</sup>*<sup>2</sup> *> O and <sup>V</sup>* <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> <sup>2</sup>*X are independently distributed. Moreover,*

$$|(W\_1 - \mu(W\_1 + W\_2)) = 0 \Rightarrow |V'V - \mu I\_r| = 0$$

*where the roots μj >* 0*, j* = 1*, . . . , r, are the eigenvalues of V* - *V > O, and the eigenvalues of V V are μj >* 0*, j* = 1*, . . . , r, with the remaining p* − *r eigenvalues of V V* - *being equal to zero.*

Let *S* = *PDP*- *, D* = diag*(μ*1*,...,μr), P P*- = *Ir, P*- *P* = *Ir*. Then, on applying Theorem 8.2.2,

$$\mathrm{d}S = \frac{\pi^{\frac{r^2}{2}}}{\Gamma\_r(\frac{r}{2})} \Big[ \prod\_{i$$

after integrating out over the differential element associated with the orthonormal matrix *P*. Substituting in *f*6*(S)* of Theorem 8.3.5 yields the following result:

**Theorem 8.3.7.** *Let μ*1*,...,μr be the nonzero roots of the determinantal equation* |*W*<sup>1</sup> − *μ(W*<sup>1</sup> + *W*2*)*| = 0 *where W*<sup>1</sup> = **XX**- *,* **X** = [*X*1*,...,Xr*]*, Xj iid* ∼ *Np(O, I ), j* = 1*,...,r <p, W*<sup>2</sup> ∼ *Wp(n, I ), n* ≥ *p, and W*<sup>1</sup> *and W*<sup>2</sup> *be independently distributed. Letting μ*<sup>1</sup> *> μ*<sup>2</sup> *>* ··· *> μr >* 0*, r<p, the joint density of the nonzero roots μ*1*,...,μr, denoted by fμ(μ*1*,...,μr), is given by*

$$\begin{split} f\_{\mu}(\mu\_{1},\ldots,\mu\_{r}) &= \frac{\pi^{\frac{r}{2}}}{\Gamma\_{r}(\frac{r}{2})} \frac{\Gamma\_{r}(\frac{n+r}{2})}{\Gamma\_{r}(\frac{p}{2})\Gamma\_{r}(\frac{n-p+r}{2})} \\ &\times \left[ \prod\_{j=1}^{r} \mu\_{j}^{\frac{p}{2}-\frac{r+1}{2}} \right] \left[ \prod\_{j=1}^{r} (1-\mu\_{j})^{\frac{n-p+r}{2}-\frac{r+1}{2}} \right] \left[ \prod\_{i$$

It can readily be observed from (8.3.9) that *U* = **XX**- <sup>+</sup>*W*<sup>2</sup> and *<sup>V</sup>* <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> <sup>2</sup>*X* are indeed independently distributed.

#### **8.3.2. A direct evaluation as an eigenvalue problem**

Consider the singular version of the original determinantal equation

$$|W\_1 - \lambda W\_2| = 0 \Rightarrow |W\_2^{-\frac{1}{2}} W\_1 W\_2^{-\frac{1}{2}} - \lambda I| = 0\tag{8.3.14}$$

where *W*<sup>1</sup> is singular, *W*<sup>2</sup> is nonsingular, and *W*<sup>1</sup> and *W*<sup>2</sup> are independently distributed. Thus, the roots *λj* 's of the equation |*W*<sup>1</sup> − *λW*2| = 0 coincide with the eigenvalues of the matrix *<sup>U</sup>* <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *<sup>W</sup>*1*W*−<sup>1</sup> 2 <sup>2</sup> . Let *W*<sup>2</sup> be a real nonsingular matrix-variate gamma or Wishart distributed matrix. Let *W*<sup>1</sup> = **XX**- *,* **X** = [*X*1*,...,Xr*]*, Xj iid* ∼ *Np(O, Σ), Σ > O, j* = 1*, . . . , r, r < p,* or, equivalently, the *p* × *r*, *r<p*, matrix **X** is a simple random sample from this *p*-variate real Gaussian population. We will take *Σ* = *I* without any loss of generality. In this case, **X** is a *p* × *r* full rank matrix with *r<p*. Let *W*<sup>2</sup> ∼ *Wp(n, I ), n* ≥ *p*, that is, *W*<sup>2</sup> is a nonsingular Wishart matrix with *n* degrees of freedom and parameter matrix *I* , and *W*<sup>1</sup> ≥ *O* (positive semi-definite). Then, the joint density of **X** and *W*2, denoted by *f*7*(***X***, W*2*)*, is the following:

$$f\_{\mathbf{T}}(\mathbf{X}, \ W\_2) = \frac{\mathbf{e}^{-\frac{1}{2}\text{tr}(\mathbf{X}\mathbf{X}^\prime)} |W\_2|^{\frac{n}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2}\text{tr}(W\_2)}}{(2\pi)^{\frac{nr}{2}} 2^{\frac{np}{2}} \Gamma\_p(\frac{n}{2})}. \tag{i}$$

Consider the exponent

$$-\frac{1}{2}\text{tr}(\mathbf{X}\mathbf{X}^\prime + W\_2) = -\frac{1}{2}\text{tr}[\mathbf{W}\_2(I + W\_2^{-\frac{1}{2}}\mathbf{X}\mathbf{X}^\prime W\_2^{-\frac{1}{2}})] = -\frac{1}{2}\text{tr}(\mathbf{W}\_2(I + VV^\prime))$$

where *<sup>V</sup>* <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> **X** ⇒ d**X** = |*W*2| *r* <sup>2</sup> d*V* for fixed *W*2. The joint density of *W*<sup>2</sup> and *V* , denoted by *f*8*(V , W*2*)*, is then

$$f\_{\\$}(V, W\_2) = \frac{|W\_2|^{\frac{n}{2} + \frac{r}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(W\_2(I + VV'))}}{(2\pi)^{\frac{pr}{2}} 2^{\frac{np}{2}} \Gamma\_p(\frac{n}{2})} . \tag{ii}$$

Observe that *V V* - <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> **XX**- *W*−<sup>1</sup> 2 <sup>2</sup> = *U* of (8.3.14). Integrating out *W*<sup>2</sup> in *(ii)* by using a real matrix-variate gamma integral, we obtain the marginal density of *V* , denoted by *f*9*(V )*, as

$$f\_{\mathbf{\hat{f}}}(V)\mathbf{d}V = \frac{\Gamma\_p(\frac{n+r}{2})}{\Gamma\_p(\frac{n}{2})\pi^{\frac{pr}{2}}}|I + VV'|^{-(\frac{n+r}{2})}\mathbf{d}V \tag{iii}$$

where *V V* -≥ *O* (positive semi-definite). Note that

$$|I\_p + VV'| = |I\_r + V'V|, \; VV' \ge O, \; V'V > O \text{ (positive definite)},$$

which follows from Theorem 8.3.3(b). This last result can also be established by expanding the following determinant in two ways as was done in (8.3.3) and (8.3.4):

$$
\begin{vmatrix} I\_p & V \\ -V' & I\_r \end{vmatrix} = |I\_p| \, |I\_r + V'V| = |I\_r| \, |I\_p + VV'| \, .
$$

$$
\Rightarrow |I\_r + V'V| = |I\_p + VV'|.
$$

Hence, the density of *V* must be of the following form where *c*<sup>1</sup> is the normalizing constant:

$$f\_{10}(V')\mathbf{d}V' = c\_1|I + V'V|^{-(\frac{n+r}{2})}\mathbf{d}V', \; n \ge p, \; r < p. \tag{\text{iv}}$$

On applying Theorem 4.2.3, *(iv)* can be transformed into a function of *S*<sup>1</sup> = *V* - *V >O*, *S*<sup>1</sup> being of order *r* × *r*. Then,

$$\mathrm{d}V' = \frac{\pi^{\frac{pr}{2}}}{\Gamma\_r(\frac{p}{2})} |S\_1|^{\frac{p}{2} - \frac{r+1}{2}} \mathrm{d}S\_1.$$

Substituting the above expression for d*V* in *f*10*(V* - *)* and then integrating over the *r* × *r* matrix *S*<sup>1</sup> *> O*, we have

$$\int\_{V'} f\_{10}(V') \mathrm{d}V' = c\_1 \frac{\pi^{\frac{pr}{2}}}{\Gamma\_r(\frac{p}{2})} \frac{\Gamma\_r(\frac{p}{2})\Gamma\_r(\frac{n+r-p}{2})}{\Gamma\_r(\frac{n+r}{2})} = 1.$$

Accordingly, the density of *V* is the following:

$$f\_{l0}(V')\mathrm{d}V' = \frac{\Gamma\_r(\frac{p}{2})}{\pi^{\frac{pr}{2}}} \frac{\Gamma\_r(\frac{n+r}{2})}{\Gamma\_r(\frac{p}{2})\Gamma\_r(\frac{n+r-p}{2})} |I + V'V|^{-(\frac{n+r}{2})} \mathrm{d}V'.\tag{8.3.15}$$

Note that *Γr(<sup>p</sup>* <sup>2</sup> *)* cancels out. Then, by re-expressing d*V* in terms of d*S*<sup>1</sup> in (8.3.15), the density of *S*<sup>1</sup> = *V* - *V* is obtained as

$$f\_{\rm I1}(S\_{\rm I}) = \frac{\Gamma\_r(\frac{n+r}{2})}{\Gamma\_r(\frac{p}{2})\Gamma\_r(\frac{n+r-p}{2})} |S\_{\rm I}|^{\frac{p}{2}-\frac{r+1}{2}} |I + S\_{\rm I}|^{-(\frac{n+r}{2})} \tag{8.3.16}$$

for *S*<sup>1</sup> = *V* - *V > O, n* ≥ *p, r < p,* and zero elsewhere, which is a real *r* × *r* matrixvariate type-2 beta density with the parameters *(<sup>p</sup>* <sup>2</sup> *, <sup>n</sup>*+*r*−*<sup>p</sup>* <sup>2</sup> *)*. Thus, the following result:

**Theorem 8.3.8.** *Let W*<sup>1</sup> = **XX**- *,* **X** = [*X*1*,...,Xr*]*, Xj iid* ∼ *Np(O, I ), j* = 1*, . . . , r, r<p, W*<sup>2</sup> ∼ *Wp(n, I ), n* ≥ *p, and W*<sup>1</sup> *and W*<sup>2</sup> *be independently distributed. Let* *<sup>V</sup>* <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> *X, V V* - ≥ *O and S*<sup>1</sup> = *V* - *V >O. Then,* |*Ip* +*V V* - |=|*Ir* +*V* - *V* |=|*Ir* +*S*1|*. The density of V is given in (8.3.15) and that of S*1*, which is specified in (8.3.16), is a real matrix-variate type-2 beta density with the parameters (<sup>p</sup>* <sup>2</sup> *, <sup>n</sup>*+*r*−*<sup>p</sup>* <sup>2</sup> *). Moreover, the positive semi-definite matrix <sup>S</sup>*<sup>2</sup> <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> **XX**- *W*−<sup>1</sup> 2 <sup>2</sup> *is distributed, almost surely, as S*1*, which has a nonsingular real matrix-variate type-2 beta distribution with the parameters (<sup>p</sup>* <sup>2</sup> *, <sup>n</sup>*+*r*−*<sup>p</sup>* <sup>2</sup> *).*

Observe that this theorem also holds when *Xj iid* ∼ *Np(O, Σ), Σ > O* and *W*<sup>2</sup> ∼ *Wp(n, Σ), Σ > O*, and the distribution of *S*<sup>2</sup> will still be free of *Σ*. Converting (8.3.16) in terms of the eigenvalues of *S*1, which are also the nonzero eigenvalues of *<sup>S</sup>*<sup>2</sup> <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> **XX**- *W*−<sup>1</sup> 2 <sup>2</sup> of (8.3.14), we have the following density, denoted by *f*12*(λ*1*,...,λr)*d*D, D* = diag*(λ*1*,...,λr)*, assuming that the eigenvalues are distinct and such that *λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λr >* 0:

$$\begin{split} f\_{12}(\lambda\_1, \dots, \lambda\_r) \mathrm{d}D &= \frac{\Gamma\_r(\frac{n+r}{2})}{\Gamma\_r(\frac{p}{2})\Gamma\_r(\frac{n+r-p}{2})} \frac{\pi^{r^2}}{\Gamma\_r(\frac{r}{2})} \\ &\times \left[ \prod\_{j=1}^r \lambda\_j^{\frac{p}{2} - \frac{r+1}{2}} \right] \left[ \prod\_{j=1}^r (1+\lambda\_j)^{-(\frac{n+r}{2})} \right] \left[ \prod\_{i$$

**Theorem 8.3.9.** *Let W*<sup>2</sup> *and* **X** *be as defined in (8.3.14). Then, the joint density of the nonzero eigenvalues <sup>λ</sup>*1*,...,λr of <sup>S</sup>*<sup>2</sup> <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> 2 <sup>2</sup> **XX**- *W*−<sup>1</sup> 2 <sup>2</sup> *, which are assumed to be distinct and such that λ*<sup>1</sup> *>* ··· *> λr >* 0*, is given in (8.3.17).*

In (8.3.13) we have obtained the joint density of the nonzero roots *μ*<sup>1</sup> *>* ··· *> μr* of the determinantal equation

$$\begin{split} |\mathbf{X}\mathbf{X}' - \mu(\mathbf{X}\mathbf{X}' + W\_2)| &= 0 \Rightarrow |\mathbf{X}\mathbf{X}' - \lambda W\_2| = 0, \ \lambda &= \frac{\mu}{1 - \mu}, \ \mu = \frac{\lambda}{1 + \lambda}, \\ &\Rightarrow |W\_2^{-\frac{1}{2}}\mathbf{X}\mathbf{X}'W\_2^{-\frac{1}{2}} - \lambda I| = 0. \end{split} \tag{8.3.18}$$

Hence, making the substitution *μj* <sup>=</sup> *λj* <sup>1</sup>+*λj* in (8.3.13) should yield the density appearing in (8.3.17). This will be stated as a theorem.

**Theorem 8.3.10.** *When μj* <sup>=</sup> *λj* 1+*λj or λj* <sup>=</sup> *μj* 1−*μj , the distributions of the μj 's or the λj 's, as respectively specified in (8.3.13) and (8.3.17), coincide.*

#### **8.3a. The Singular Complex Case**

The matrix manipulations that were utilized in the real case also apply in the complex domain. The following result parallels Theorem 8.3.1:

**Theorem 8.3a.1.** *Let the p* × *p matrix A* = *A*<sup>∗</sup> ≥ *O be a Hermitian positive semidefinite matrix of rank r<p, where A*∗ *designate the conjugate transpose of A. Then, A can be represented as A* = *A*1*A*<sup>∗</sup> <sup>1</sup> *where A*<sup>1</sup> *is p* × *r, r < p, of rank r, that is, all the columns of A*<sup>1</sup> *are linearly independent.*

A derivation of the Wishart matrix in the complex case can also be worked out from a complex Gaussian distribution. In earlier chapters, we have derived the Wishart density as a particular case of the matrix-variate gamma density, whether in the real or complex domain. Let the *p* ×1 complex vectors *X*˜ *<sup>j</sup> , j* = 1*, . . . , n,* be independently distributed as *p*-variate complex Gaussian random variables with the null vector as their mean value and a common Hermitian positive definite covariance matrix, that is, *X*˜ *<sup>j</sup> iid* ∼ *N*˜*p(O, Σ), Σ* = *Σ*<sup>∗</sup> *> O* for *j* = 1*,...,n*. Let the *p* × *n* matrix **X**˜ = [*X*˜ <sup>1</sup>*,..., X*˜ *<sup>n</sup>*] be the simple random sample matrix from this complex Gaussian population. Then, the density of **X**˜ , denoted by *f (*˜ **X**˜ *)*, is given by

$$\tilde{f}(\tilde{\mathbf{X}}) = \frac{\mathbf{e}^{-\sum\_{j=1}^{n} \tilde{X}\_j^\* \Sigma^{-1} \tilde{X}\_j}}{\pi^{np} |\det(\boldsymbol{\Sigma})|^n} = \frac{\mathbf{e}^{-\text{tr}(\boldsymbol{\Sigma}^{-1} \tilde{\mathbf{X}} \tilde{\mathbf{X}}^\*)}}{\pi^{np} |\det(\boldsymbol{\Sigma})|^n},\tag{8.3a.1}$$

for *n* ≥ *p*. Let the *p* × *p* Hermitian positive definite matrix **X**˜ **X**˜ <sup>∗</sup> = *W*˜ . For *n* ≥ *p*, it follows from Theorem 4.2a.3 that

$$\mathbf{d}\tilde{\mathbf{X}} = \frac{\pi^{np}}{\tilde{\Gamma}\_p(n)} |\det(\tilde{W})|^{n-p} \mathbf{d}\tilde{W} \tag{8.3a.2}$$

where d**X**˜ <sup>=</sup> <sup>d</sup>*Y*<sup>1</sup> <sup>∧</sup> <sup>d</sup>*Y*2*,* **<sup>X</sup>**˜ <sup>=</sup> *<sup>Y</sup>*<sup>1</sup> <sup>+</sup> *iY*2*, i* <sup>=</sup> √−1*, Y*1*, Y*<sup>2</sup> being real *<sup>p</sup>* <sup>×</sup> *<sup>n</sup>* matrices. Given (8.3a.1) and (8.3*a*.2), the density of *W*˜ , denoted by *f*˜ <sup>1</sup>*(W )* ˜ , is obtained as

$$\tilde{f}\_{\mathbf{l}}(\tilde{W}) = \frac{|\det(\tilde{W})|^{n-p} \mathbf{e}^{-\text{tr}(\boldsymbol{\Sigma}^{-1}\tilde{W})}}{\tilde{\Gamma}\_p(n)|\det(\boldsymbol{\Sigma})|^n}, \ \tilde{W} = \tilde{W}^\* > O, \ n \ge p,\tag{8.3a.3}$$

and zero elsewhere; this is *the complex Wishart density with n degrees of freedom and parameter matrix Σ>O*, which is written as *W*˜ ∼ *W*˜ *p(n, Σ), Σ > O, n* ≥ *p*. If *<sup>X</sup>*˜ *<sup>j</sup>* <sup>∼</sup> *<sup>N</sup>*˜*p(μ, Σ), Σ > O,* ˜ then letting ¯ *<sup>X</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>n</sup>(X*˜ <sup>1</sup> +···+ *<sup>X</sup>*˜ *n),* ¯ **<sup>X</sup>**˜ <sup>=</sup> *(* ¯ *X, . . . ,* ˜ ¯ *X)*˜ and

$$
\tilde{W} = (\tilde{\mathbf{X}} - \bar{\tilde{\mathbf{X}}})(\tilde{\mathbf{X}} - \bar{\tilde{\mathbf{X}}})^\*,\tag{8.3a.4}
$$

we have *W*˜ ∼ *W*˜ *p(n* − 1*, Σ), n* − 1 ≥ *p, Σ > O*, or *W*˜ has a Wishart distribution with *n* − 1 instead of *n* degrees of freedom, *μ*˜ = *O* being eliminated by subtracting the sample mean. The remainder of this section is devoted to the distribution of **X**˜ **X**˜ <sup>∗</sup> for *n<p*, that is, in the singular case. Proceeding as in the real case, we have

$$\det(I\_P - \bar{\mathbf{X}}\bar{\mathbf{X}}^\*) = \det(I\_r - \bar{\mathbf{X}}^\*\bar{\mathbf{X}}) \tag{8.3a.5}$$

where **X**˜ **X**˜ <sup>∗</sup> ≥ *O* is *p* × *p,* whereas the *r* × *r*, *r<p*, matrix **X**˜ <sup>∗</sup>**X**˜ *> O*. Then, we have the following result:

**Theorem 8.3a.2.** *Let T*˜ = **X**˜ **X**˜ <sup>∗</sup> *and T*˜ = **X**˜ <sup>∗</sup>**X**˜ *. Then, the eigenvalues of T*˜ *are all real and positive and the nonzero eigenvalues of T*˜ *are identical to those of T*˜ *, the remaining ones being equal to zero.*

The complex counterpart of Theorem 8.3.4 that follows can be derived using steps parallel to those utilized in the real case.

**Theorem 8.3a.3.** *Let the p* × 1 *complex vectors X*˜ *<sup>j</sup> iid* ∼ *N*˜*p(μ, Σ), Σ > O, j* ˜ = 1*,...,r. Let* **X**˜ = [*X*˜ <sup>1</sup>*,..., X*˜ *<sup>r</sup>*] *be the simple random sample matrix from this complex p-variate Gaussian population. Let T*˜ <sup>2</sup> <sup>=</sup> **<sup>X</sup>**˜ <sup>∗</sup>*Σ*−1**X**˜ *or <sup>T</sup>*˜ <sup>2</sup> = **X**˜ <sup>∗</sup>**X**˜ *if Σ* = *Ip. Then, the density of T*˜ <sup>2</sup>*, denoted by f*˜ *u(T*˜ <sup>2</sup>*), is the following:*

$$\tilde{f}\_{\mu}(\tilde{T}\_2) = \frac{1}{\tilde{T}\_r(p)} |\text{det}(\tilde{T}\_2)|^{p-r} \mathbf{e}^{-\text{tr}(\tilde{T}\_2)}, \ \tilde{T}\_2 > O, \ r \le p,\tag{8.3a.6}$$

*so that T*˜ <sup>2</sup> ∼ *W*˜ *r(p, I ), that is, T*˜ <sup>2</sup> *has a complex Wishart distribution with p degrees of freedom. If μ*˜ = *O, let T*˜ <sup>2</sup> <sup>=</sup> *(***X**˜ <sup>−</sup> ¯ **<sup>X</sup>**˜ *)*∗*(***X**˜ <sup>−</sup> ¯ **X**˜ *) or T*˜ <sup>2</sup> <sup>=</sup> *(***X**˜ <sup>−</sup> ¯ **<sup>X</sup>**˜ *)*∗*Σ*−1*(***X**˜ <sup>−</sup> ¯ **X**˜ *) when Σ* = *I with* ¯ **<sup>X</sup>**˜ <sup>=</sup> *(* ¯ *X, . . . ,* ˜ ¯ *X)*˜ *wherein* ¯ *<sup>X</sup>*˜ <sup>=</sup> <sup>1</sup> *<sup>r</sup> (X*˜ <sup>1</sup>+· · ·+*X*˜ *r). Then, T*˜ <sup>2</sup> ∼ *W*˜ *r(p*−1*, I ), r* ≤ *p*−1*.*

#### **8.3a.1. Singular gamma or singular Gaussian distribution, complex case**

Let **X**˜ = [*X*˜ <sup>1</sup>*,..., X*˜ *<sup>r</sup>*] where the *p* × 1 vectors *X*˜ <sup>1</sup>*,..., X*˜ *<sup>r</sup>* are independently distributed as complex Gaussian vectors whose mean value is the null vector and covariance matrix is *I* , that is, *X*˜ *<sup>j</sup> iid* ∼ *N*˜*p(O, I )*, *j* = 1*, . . . , r, r < p*. Then, the density of **X**˜ , denoted by *f*˜ <sup>1</sup>*(***X**˜ *)*, is

$$\tilde{f}\_{\mathbf{l}}(\tilde{\mathbf{X}}) = \frac{\mathbf{e}^{-\text{tr}(\mathbf{X}\mathbf{X}^\*)}}{\pi^{np}}.\tag{8.3a.7}$$

Let *r<p* so that **X**˜ **X**˜ <sup>∗</sup> ≥ *O* (Hermitian positive semi-definite). Let the *p* × *p* Hermitian positive definite matrix *W*˜ <sup>2</sup> have a Wishart density with *n* ≥ *p* degrees of freedom and parameter matrix *I* , that is, *W*˜ <sup>2</sup> ∼ *W*˜ *p(n, I ), n* ≥ *p*. Further assume that **X**˜ and *W*˜ <sup>2</sup> are independently distributed. Then, the joint density of **X**˜ and *W*˜ 2, denoted by *f*˜ <sup>2</sup>*(***X**˜ *, W*˜ <sup>2</sup>*)*, is the following:

$$\tilde{f}\_2(\tilde{\mathbf{X}}, \tilde{W}\_2) = \frac{e^{-\text{tr}(\tilde{\mathbf{X}}\tilde{\mathbf{X}}^\* + \tilde{W}\_2)} |\text{det}(\tilde{W}\_2)|^{n-p}}{\pi^{np}\tilde{\Gamma}\_p(n)}, \ n \ge p, \ r < p,\tag{8.3a.8}$$

where **X**˜ is *p* ×*r*, *r<p*. Letting the *p* ×*p* matrix *U*˜ = **X**˜ **X**˜ <sup>∗</sup> + *W*˜ <sup>2</sup> *> O*, the joint density of *U*˜ and **X**˜ , denoted by *f*˜ <sup>3</sup>*(***X**˜ *, U )*˜ , is given by

$$\tilde{f}\_{\tilde{\mathbf{X}}}(\tilde{\mathbf{X}}, \tilde{U}) = \frac{\mathbf{e}^{-\text{tr}(U)} |\text{det}(\tilde{U} - \tilde{\mathbf{X}}\tilde{\mathbf{X}}^{\*})|^{n-p}}{\pi^{np} \, \tilde{\Gamma}\_{p}(n)}, \; n \ge p, \; r < p.$$

Letting *U*˜ = *U*˜ <sup>∗</sup> *> O*, one has

$$\begin{aligned} |\det(\tilde{U} - \tilde{X}\tilde{X}^\*)| &= |\det(\tilde{U})| \, |\det(I - \tilde{U}^{-\frac{1}{2}}\tilde{X}\tilde{X}^\*\tilde{U}^{-\frac{1}{2}})| \\ &= |\det(\tilde{U})| \, |\det(I - \tilde{V}\tilde{V}^\*)|, \ \tilde{V} = \tilde{U}^{-\frac{1}{2}}\tilde{X}, \end{aligned}$$

where *V*˜ is a *p* × *r* matrix of rank *r<p*. Since d*X*˜ = |det*(U )*˜ | *<sup>r</sup>*d*V*˜ for fixed *U*˜ , the joint density of *U*˜ and *V*˜ is as follows:

$$\tilde{f}\_4(\tilde{U}, \tilde{V}) = \frac{|\det(\tilde{U})|^{n+r-p} \mathbf{e}^{-\text{tr}(U)}}{\pi^{np} \tilde{\Gamma}\_p(n)} |\det(I\_p - \tilde{V}\tilde{V}^\*)|^{n-p}. \tag{8.3a.9}$$

As has been previously noted,

$$|\det(I\_p - \tilde{V}\tilde{V}^\*)| = |\det(I\_r - \tilde{V}^\*\tilde{V})|$$

where *V*˜ <sup>∗</sup>*V*˜ is an *r* × *r* Hermitian positive definite matrix. Since *f*˜ <sup>4</sup>*(U ,* ˜ *V )*˜ can be factorized, *U*˜ and *V*˜ are independently distributed, and the marginal density of *V*˜ , denoted by *f*˜ <sup>5</sup>*(V )*˜ , is of the following form, after integrating out *U*˜ with the help of a complex matrix-variate gamma integral:

$$\tilde{f}\_{\tilde{\mathbf{S}}}(\tilde{V}^\*)\mathbf{d}\tilde{V} = \tilde{c} \, |\det(I - \tilde{V}\tilde{V}^\*)|^{n-p}\mathbf{d}\tilde{V} = \tilde{c} \, |\det(I - \tilde{V}^\*\tilde{V})|^{n-p}\mathbf{d}\tilde{V}^\* \tag{8.3a.10}$$

where *c*˜ is the normalizing constant. Now, proceeding as in the real case, the following result is obtained:

**Theorem 8.3a.4.** *Let the p* × 1 *complex vectors X*˜ *<sup>j</sup> iid* ∼ *N*˜*p(O, I ), j* = 1*,...,r, and* **X**˜ = [*X*˜ <sup>1</sup>*,..., X*˜ *<sup>r</sup>*] *be the p* × *r full rank sample matrix with r<p. Let W*˜ <sup>2</sup> *be a p* × *p*

*Hermitian positive definite matrix having a nonsingular Wishart density with n degrees of freedom and parameter matrix I , that is, W*˜ <sup>2</sup> ∼ *W*˜ *p(n, I ), n* ≥ *p, and assume that W*˜ <sup>2</sup> *and* **X**˜ *are independently distributed. Let U*˜ = **X**˜ **X**˜ <sup>∗</sup> + *W*˜ <sup>2</sup> *be Hermitian positive definite, <sup>V</sup>*˜ <sup>=</sup> *<sup>U</sup>*˜ <sup>−</sup><sup>1</sup> <sup>2</sup>**X**˜ *and S*˜ = *V*˜ <sup>∗</sup>*V*˜ *. Then, U*˜ *and V*˜ *are independently distributed and the densities of V*˜ *and S*˜*, respectively denoted by f*˜ <sup>5</sup>*(V )*˜ *and f*˜ <sup>6</sup>*(S)*˜ *, are*

$$\tilde{f}\_{\tilde{\mathbf{f}}}(\tilde{V}) = \frac{\tilde{\Gamma}\_r(p)}{\pi^{pr}} \frac{\tilde{\Gamma}\_r(n+r)}{\tilde{\Gamma}\_r(p)\tilde{\Gamma}\_r(n-p+r)} |\det(I - \tilde{V}^\*\tilde{V})|^{n-p}, \; r < p,\tag{8.3a.11}$$

*and*

$$\tilde{f}\_{\tilde{\mathbb{S}}}(\tilde{\mathbb{S}}) = \frac{\Gamma\_r(n+r)}{\tilde{\Gamma}\_r(p)\tilde{\Gamma}\_r(n-p+r)} |\det(\tilde{\mathbb{S}})|^{p-r} |\det(I-\tilde{\mathbb{S}})|^{n-p}, \ n \ge p, \ r < p,\tag{8.3a.12}$$

*observing that n* − *p* = *(n* + *r* − *p)* − *r.*

Let *W*˜ <sup>1</sup> = **X**˜ **X**˜ <sup>∗</sup> where the *p* × *r* matrix **X**˜ is the previously defined sample matrix arising from a standard complex Gaussian population. Let *W*˜ <sup>2</sup> ∼ *W*˜ *p(n, I )* and assume that *W*˜ <sup>1</sup> and *W*˜ <sup>2</sup> are independently distributed. Letting *U*˜ = **X**˜ **X**˜ <sup>∗</sup> + *W*˜ <sup>2</sup> *> O*, consider the determinantal equation

$$\det(\ddot{W}\_1 - \mu(\ddot{W}\_1 + \ddot{W}\_2)) = 0.\tag{i}$$

Then, as in the real case, the following result can be obtained:

**Theorem 8.3a.5.** *Let W*˜ <sup>1</sup>*, W*˜ <sup>2</sup>*, V*˜ *and U*˜ *be as previously defined. Then,*

$$\det(\tilde{W}\_1 - \mu(\tilde{W}\_1 + \tilde{W}\_2)) = 0 \Rightarrow \det(\tilde{V}^\* \tilde{V} - \mu I\_r) = 0. \tag{ii}$$

This establishes that the roots *μj* 's of the determinantal equation *(i)* coincide with the eigenvalues of *V*˜ <sup>∗</sup>*V*˜ , and since *V*˜ <sup>∗</sup>*V >O* ˜ , the eigenvalues are real and positive. Let the eigenvalues be distinct, in which case *μ*<sup>1</sup> *>* ··· *> μr >* 0. Then, steps parallel to those utilized in the real case will yield the following result:

**Theorem 8.3a.6.** *Let μ*1*,...,μr be the nonzero roots of the equation*

$$\det(\tilde{W}\_1 - \mu(\tilde{W}\_1 + \tilde{W}\_2)) = 0$$

*where W*˜ <sup>1</sup> *and W*˜ <sup>2</sup> *are as previously defined. Let μ*<sup>1</sup> *>* ··· *> μr >* 0*, r < p, and let D* = diag*(μ*1*,...,μr). Then, the joint density of the eigenvalues μ*1*,...,μr, denoted by* *f*˜ *μ(μ*1*,...,μr), which is available from (8.3a.12) and the relationship between* d*S*˜ *and* d*D, is the following:*

$$\begin{split} \tilde{f}\_{\mu}(\mu\_{1},...,\mu\_{r}) \mathrm{d}D &= \frac{\pi^{r(r-1)}}{\tilde{\Gamma}\_{r}(r)} \frac{\tilde{\Gamma}\_{r}(n+r)}{\tilde{\Gamma}\_{r}(p)\tilde{\Gamma}\_{r}(n-p+r)} \\ &\times \Big[\prod\_{j=1}^{r} \mu\_{j}^{p-r}\Big] \Big[\prod\_{j=1}^{r} (1-\mu\_{j})^{n-p}\Big] \Big[\prod\_{i$$

#### **8.3a.2. A direct method of evaluation in the complex case**

The steps of the derivations being analogous to those utilized in the real case, the corresponding theorems will simply be stated for the complex case. Let *W*˜ <sup>1</sup> = **X**˜ **X**˜ <sup>∗</sup> be a *p* × *p* singular Wishart matrix where the *p* × *r* matrix **X**˜ = [*X*˜ <sup>1</sup>*,..., X*˜ *<sup>r</sup>*]*, X*˜ *<sup>j</sup> iid* ∼ *N*˜*p(O, I ), j* = 1*, . . . , r,* and *r<p*. That is, **X**˜ is a simple random sample matrix from this complex Gaussian population. Let *W*˜ <sup>2</sup> *> O* have a complex Wishart distribution with *n* degrees of freedom and parameter matrix *I* , that is, *W*˜ <sup>2</sup> ∼ *W*˜ *p(n, I ), n* ≥ *p*, and assume that *W*˜ <sup>1</sup> and *W*˜ <sup>2</sup> be independently distributed. Consider the initial equation

$$\det(\tilde{W}\_1 - \lambda \tilde{W}\_2) = 0 \Rightarrow \det(\tilde{W}\_2^{-\frac{1}{2}} \tilde{W}\_1 \tilde{W}\_2^{-\frac{1}{2}} - \lambda I) = 0,\tag{8.3a.14}$$

whose roots are *λ*1*,...,λr,* 0*,...,* 0, and the additional equation det*(W*˜ <sup>1</sup>−*μ(W*˜ <sup>1</sup>+*W*˜ <sup>2</sup>*))* = 0*,* whose roots will be denoted by *μ*1*,...,μr,* 0*,...,* 0. In the following theorems, the *λj* 's and *μj* 's will refer to these two sets of roots.

**Theorem 8.3a.7.** *Let W*˜ <sup>1</sup> = **XX**∗*,* **X** = [*X*1*,...,Xr*]*, Xj iid* ∼ *N*˜*p(O, I ), j* = 1*, . . . , r, and r<p. Let W*˜ <sup>2</sup> ∼ *W*˜ *p(n, I ), n* ≥ *p, be a nonsingular complex Wishart matrix with n degrees of freedom and parameter matrix I . Further assume that W*˜ <sup>1</sup> *and W*˜ <sup>2</sup> *are independently distributed. Let <sup>V</sup>*˜ <sup>=</sup> *<sup>W</sup>*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> **X**˜ *, V*˜ *V*˜ <sup>∗</sup> ≥ *O, V*˜ <sup>∗</sup>*V > O,* ˜ *and S*˜ <sup>1</sup> = *V*˜ <sup>∗</sup>*V >O* ˜ *. Then,* det*(Ip* + *V*˜ *V*˜ <sup>∗</sup>*)* = det*(Ir* + *V*˜ <sup>∗</sup>*V )*˜ = det*(Ir* + *S*˜ <sup>1</sup>*), and the densities of V*˜ <sup>∗</sup> *and S*˜ 1 *are respectively given by*

$$\tilde{f}\_{\rm l0}(\tilde{V}^\*) \mathrm{d}\tilde{V}^\* = \frac{\bar{\varGamma}\_r(p)}{\pi^{p(p-1)}} \frac{\bar{\varGamma}\_r(n+r)}{\tilde{\varGamma}\_r(p)\tilde{\varGamma}\_r(n+r-p)} |\mathrm{det}(I+\tilde{V}^\*\tilde{V})|^{-(n+r)} \mathrm{d}\tilde{V}^\* \qquad (8.3a.15)$$

*and*

$$\tilde{f}\_{11}(\tilde{S}\_{1}) = \frac{\tilde{\Gamma}\_{r}(n+r)}{\tilde{\Gamma}\_{r}(p)\tilde{\Gamma}\_{r}(n+r-p)} |\det(\tilde{S}\_{1})|^{p-r} |\det(I+\tilde{S}\_{1})|^{-(n+r)},\tag{8.3a.16}$$

*which is a complex matrix-variate type-2 beta distribution with the parameters (p, n* − *p* + *r). Additionally, the positive semi-definite matrix S*˜ <sup>2</sup> <sup>=</sup> *<sup>W</sup>*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> **<sup>X</sup>**˜ **<sup>X</sup>**˜ <sup>∗</sup>*W*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *is distributed, almost surely, as S*˜ 1*, which has a nonsingular complex matrix-variate type-2 beta distribution with the parameters (p, n* − *p* + *r).*

**Theorem 8.3a.8.** *Let W*˜ <sup>2</sup> *and* **X**˜ *be as defined in Theorem 8.3a.7. Then, the joint density of the nonzero eigenvalues λ*1*,...,λr of S*˜ <sup>2</sup> <sup>=</sup> *<sup>W</sup>*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> **<sup>X</sup>**˜ **<sup>X</sup>**˜ <sup>∗</sup>*W*˜ <sup>−</sup><sup>1</sup> 2 <sup>2</sup> *, which are assumed to be distinct and such that λ*<sup>1</sup> *>* ··· *> λr >* 0*, is given by*

$$\begin{split} \tilde{f}\_{13}(\lambda\_1, \dots, \lambda\_r) \mathrm{d}D &= \frac{\tilde{\varGamma}\_r(n+r)}{\tilde{\varGamma}\_r(p)\tilde{\varGamma}\_r(n-p+r)} \frac{\pi^{r(r-1)}}{\tilde{\varGamma}\_r(r)} \\ &\times \Big[\prod\_{j=1}^r \lambda\_j^{p-r}\Big] \Big[\prod\_{j=1}^r (1+\lambda\_j)^{-(n+r)}\Big] \Big[\prod\_{i$$

*where D* = diag*(λ*1*,...,λr).*

**Theorem 8.3a.9.** *When μj* <sup>=</sup> *λj* 1+*λj or λj* <sup>=</sup> *μj* 1−*μj , the distributions of the μj 's and λj 's, as respectively defined in (8.3a.13) and (8.3a.17), coincide.*

#### **8.4. The Case of One Wishart or Gamma Matrix in the Real Domain**

If we only consider a single *p* × *p* gamma matrix *W* with parameters *(α, B), B > O, (α) > <sup>p</sup>*−<sup>1</sup> <sup>2</sup> *,* whose density is

$$f(W) = \frac{|B|^{\alpha}}{\Gamma\_p(\alpha)} |W|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(BW)}, \ W > O, \ B > O, \ \Re(\alpha) > \frac{p-1}{2}, \tag{8.4.1}$$

and zero elsewhere, then it can readily be determined that *Z* = *B* 1 <sup>2</sup>*W* has the density

$$f\_{\mathbb{T}}(Z) = \frac{1}{\Gamma\_p(\alpha)} |Z|^{\alpha - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(Z)}, \ Z > O, \ \Re(\alpha) > \frac{p-1}{2}, \tag{8.4.2}$$

and zero elsewhere. When *<sup>α</sup>* <sup>=</sup> *<sup>m</sup>* <sup>2</sup> and *<sup>B</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *I* , *Z* has a Wishart density with *m* ≥ *p* degrees of freedom and parameter matrix *I* , its density being given by

$$f\_2(Z) = \frac{1}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})} |Z|^{\frac{m}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(Z)}\tag{8.4.3}$$

for *m* ≥ *p*, and zero elsewhere. Since *Z* is symmetric, there exists an orthonormal matrix *P, PP*- = *I, P*- *P* = *I* , such that *Z* = *PDP* with *D* = diag*(λ*1*,...,λp)* where *λj >* 0*, j* = 1*,... , p,* are the (assumed distinct) positive eigenvalues of *Z*, *Z* being positive definite. Consider the equation *ZQj* = *λjQj* where the *p* × 1 vector *Qj* is an eigenvector corresponding to the eigenvalue *λj* . Since the eigenvalues are distinct, the eigenvectors are orthogonal to each other. Let *Qj* be the normalized eigenvector, *Q*- *<sup>j</sup>Qj* = 1*, j* = 1*,... , p, Q*- *i Qj* = 0*,* for all *i* = *j* . Letting *Q* = [*Q*1*,...,Qp*], this *p* ×*p* matrix *Q* is such that

$$Z \mathcal{Q} = \mathcal{Q}D \Rightarrow Z = \mathcal{Q}D \mathcal{Q}' \Rightarrow \mathcal{Q} = P, \ P = [P\_1, \dots, P\_p]$$

where *P*1*,...,Pp* are the columns of the *p* × *p* matrix *P*. Yet, *P* need not be unique as *P*- *<sup>j</sup>Pj* = 1 ⇒ *(*−*Pj )*- *(*−*Pj )* = 1. In order to make it unique, let us require that the first nonzero element of each of the vectors *P*1*,...,Pp* be positive. Considering the transformation *Z* = *PDP*- , it follows from Theorem 8.2.1 that, before integrating over the full orthogonal group,

$$\mathrm{d}Z = \left\{ \prod\_{i$$

where *h(P )* is a differential element associated with the unique matrix of eigenvectors *P*, as is explained in Mathai (1997). The integral over *h(P )* gives

$$\int\_{O\_p} h(P) = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \tag{8.4.5}$$

where *Op* is the full orthogonal group of *p* × *p* orthonormal matrices. On observing that |*Z*| = *p <sup>j</sup>*=<sup>1</sup> *λj* and tr*(Z)* <sup>=</sup> *<sup>λ</sup>*<sup>1</sup> +···+*λp*, it follows from (8.4.3) and (8.4.4) that the joint density of the eigenvalues *λ*1*,...,λp* and the matrix of eigenvectors *P* can be expressed as

$$\,\_2f\_3(D,P)\,\mathrm{d}D\wedge\mathrm{d}P = \frac{[\prod\_{j=1}^p \lambda\_j]^{\frac{m}{2} - \frac{p+1}{2}}\mathrm{e}^{-\frac{1}{2}\sum\_{j=1}^p \lambda\_j} [\prod\_{i$$

Thus, the marginal density of *λ*1*,...,λp* can be obtained by integrating out *P*. Denoting this marginal density by *f*4*(λ*1*,...,λp)*, we have

$$f\_4(\lambda\_1, \dots, \lambda\_p) \,\mathrm{d}D = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \frac{[\prod\_{j=1}^p \lambda\_j]^{\frac{m}{2} - \frac{p+1}{2}} \mathrm{e}^{-\frac{1}{2} \sum\_{j=1}^p \lambda\_j}}{2^{\frac{mp}{2}} \Gamma\_p(\frac{m}{2})} \left[ \prod\_{i$$

and zero elsewhere. The density of *P* is then the remainder of the joint density. Denoting it by *f*5*(P )*, we have the following:

$$f\_{\mathfrak{H}}(P)\,\mathrm{d}P = \frac{\Gamma\_p(\frac{p}{2})}{\pi^{\frac{p^2}{2}}}h(P),\,\,\,P\,P' = I,\,\,\tag{8.4.8}$$

where *P* = [*P*1*,...,Pp*], the first nonzero element of *Pj* being positive for *j* = 1*,...,p*, so as to make *P* unique. Hence, the following result:

**Theorem 8.4.1.** *Let the p* × *p real positive definite matrix Z have a Wishart density with m* ≥ *p degrees of freedom and parameter matrix I . Let λ*1*,...,λp be the distinct positive eigenvalues of Z in decreasing order and the p* × *p orthonormal matrix P be the matrix of normalized eigenvectors corresponding to the λj 's. Then,* {*λ*1*,...,λp*} *and P are independently distributed, with the densities of* {*λ*1*,...,λp*} *and P being respectively given in (8.4.7) and (8.4.8).*

#### **8.4a. The Case of One Wishart or Gamma Matrix, Complex Domain**

Let *W*˜ be a *p* × *p* complex gamma distributed matrix, *W*˜ = *W*˜ <sup>∗</sup> *> O*, whose density is

$$\tilde{f}(\tilde{W}) = \frac{|\det(\tilde{B})|^{\alpha}}{\tilde{\Gamma}\_p(\alpha)} |\det(\tilde{W})|^{\alpha - p} \mathbf{e}^{-\text{tr}(\tilde{B}\tilde{W})}, \ \tilde{W} > O, \ \tilde{B} > O, \ \mathfrak{N}(\alpha) > p - 1. \tag{8.4a.1}$$

Letting *<sup>Z</sup>*˜ <sup>=</sup> *<sup>B</sup>*˜ <sup>1</sup> <sup>2</sup>*W*˜ , *Z*˜ has the density

$$\tilde{f}\_{\mathbb{I}}(\tilde{X}) = \frac{1}{\tilde{\Gamma}\_p(\alpha)} |\text{det}(\tilde{Z})|^{\alpha - p} \mathbf{e}^{-\text{tr}(\tilde{Z})}, \ \tilde{X} > O, \ \Re(\alpha) > p - 1. \tag{8.4a.2}$$

If *α* = *m, m* = *p, p* + 1*,...* in (8.4a.2), then we have the following Wishart density having *m* degrees of freedom in the complex domain:

$$\tilde{f}\_2(\tilde{Z}) = \frac{1}{\tilde{\Gamma}\_p(m)} |\det(\tilde{Z})|^{m-p} e^{-\text{tr}(\tilde{Z})}.\tag{8.4a.3}$$

Consider a unique unitary matrix *P ,*˜ *P*˜*P*˜ <sup>∗</sup> = *I, P*˜ <sup>∗</sup>*P*˜ = *I* such that *P*˜ <sup>∗</sup>*Z*˜*P*˜ = diag*(λ*1*,...,λp)* where *λ*1*,...,λp* are the eigenvalues of *Z*˜, which are real and positive since *Z*˜ is Hermitian positive definite. Letting the eigenvalues be distinct and such that *λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp >* 0, observe that

$$\mathrm{d}\tilde{Z} = \left\{ \prod\_{i$$

where *h(*˜ *P )*˜ is the differential element corresponding to the unique unitary matrix *P*˜. Then, as established in Mathai (1997),

$$\int\_{\tilde{O}\_p} \tilde{h}(\tilde{P}) = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} \tag{8.4a.5}$$

where *O*˜*<sup>p</sup>* is the full unitary group. Thus, the joint density of the eigenvalues *λ*1*,...,λp* and their associated normalized eigenvectors, denoted by *f*˜ <sup>3</sup>*(D, P )*˜ , is

$$\tilde{f}\_{\mathbf{\tilde{J}}}(D,\,\tilde{P})\,\mathrm{d}D\wedge\mathrm{d}\tilde{P} = \frac{[\prod\_{j=1}^{p}\lambda\_{j}]^{m-p}\mathbf{e}^{-\sum\_{j=1}^{p}\lambda\_{j}}}{\tilde{\Gamma}\_{p}(m)}\bigg[\prod\_{i$$

Then, integrating out *P*˜ with the help of (8.4a.5), the marginal density of the eigenvalues *λ*<sup>1</sup> *>* ··· *> λp >* 0, denoted by *f*˜ <sup>4</sup>*(λ*1*,...,λp)*, is the following:

$$\tilde{f}\_4(\lambda\_1, \dots, \lambda\_p) \,\mathrm{d}D = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} \frac{[\prod\_{j=1}^p \lambda\_j]^{m-p} \mathsf{e}^{-\sum\_{j=1}^p \lambda\_j}}{\tilde{\Gamma}\_p(m)} \Biggl[ \prod\_{i$$

Thus, the joint density of the normalized eigenvectors forming *P*˜, denoted by *f*˜ <sup>5</sup>*(P )*˜ , is given by

$$
\tilde{f}\_{\tilde{\mathbf{S}}}(\tilde{P})\,\mathrm{d}\tilde{P} = \frac{\tilde{\Gamma}\_p(p)}{\pi^{p(p-1)}}\tilde{h}(\tilde{P}).\tag{8.4a.8}
$$

These results are summarized in the following theorem.

**Theorem 8.4a.1.** *Let Z*˜ *have the density appearing in (8.4a.3). Then, the joint density of the distinct eigenvalues λ*<sup>1</sup> *>* ··· *> λp >* 0 *of Z*˜ *is as given in (8.4a.7) and the joint density of the associated normalized eigenvectors comprising the unitary matrix P is as specified in (8.4a.8).*

#### **References**

T.W. Anderson (2003): *An Introduction to Multivariate Statistical Analysis*, 3rd Edition, Wiley Interscience, New York.

A.M. Mathai (1997): *Jacobians of Matrix Transformations and Functions of Matrix Argument*, World Scientific Publishing, New York.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 9 Principal Component Analysis**

### **9.1. Introduction**

We will adopt the same notations as in the previous chapters. Lower-case letters *x, y,...* will denote real scalar variables, whether mathematical or random. Capital letters *X, Y, . . .* will be used to denote real matrix-variate mathematical or random variables, whether square or rectangular matrices are involved. A tilde will be placed on top of letters such as *x,*˜ *y,*˜ *X,* ˜ *Y*˜ to denote variables in the complex domain. Constant matrices will for instance be denoted by *A, B, C*. A tilde will not be used on constant matrices unless the point is to be stressed that the matrix is in the complex domain. The determinant of a square matrix *A* will be denoted by |*A*| or det*(A)* and, in the complex case, the absolute value or modulus of the determinant of *A* will be denoted as |det*(A)*|. When matrices are square, their order will be taken as *p* × *p*, unless specified otherwise. When *A* is a full rank matrix in the complex domain, then *AA*∗ is Hermitian positive definite where an asterisk designates the complex conjugate transpose of a matrix. Additionally, d*X* will indicate the wedge product of all the distinct differentials of the elements of the matrix *X*. Letting the *p* × *q* matrix *X* = *(xij )* where the *xij* 's are distinct real scalar variables, <sup>d</sup>*<sup>X</sup>* = ∧*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> <sup>∧</sup>*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>d</sup>*xij* . For the complex matrix *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup> *iX*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, where *<sup>X</sup>*<sup>1</sup> and *X*<sup>2</sup> are real, d*X*˜ = d*X*<sup>1</sup> ∧ d*X*2.

The requisite theory for the study of *Principal Component Analysis* has already been introduced in Chap. 1, namely, the problem of optimizing a real quadratic form that is subject to a constraint. We shall formulate the problem with respect to a practical situation consisting of selecting the most "relevant" variables in a study. Suppose that a scientist would like to devise a "good health" index in terms of certain indicators. After selecting a random sample of individuals belonging to a population that is homogeneous with respect to a variety of factors, such as age group, racial background and environmental conditions, she managed to secure measurements on *p* = 15 variables, including for instance, *x*1: weight, *x*2: systolic pressure, *x*3: blood sugar level, and *x*4: height. She now

© The Author(s) 2022

A. M. Mathai et al., *Multivariate Statistical Analysis in the Real and Complex Domains*, https://doi.org/10.1007/978-3-030-95864-0 9

faces a quandary as there is an excessive number of variables and some of them may not be relevant to her investigation. A methodology is thus required for discarding the unimportant ones. As a result, the number of variables will be reduced and the interpretation of the results, possibly facilitated. So, what might be the most pertinent variables in any such study? If all the observations of a particular variable *xj* are concentrated around a certain value, *μj* , then that variable is more or less predetermined. As an example, suppose that the height of the individuals comprising a study group is neighboring 1.8 meters and that, on the other hand, it is observed that the weight measurements are comparatively spread out. On account of this, while height is not a particularly consequential variable in connection with this study, weight is. Accordingly, we can utilize the criterion: *the larger the variance of a variable, the more relevant this variable is*. Let the *p* × 1 vector *X, X*- = *(x*1*,...,xp)*, encompass all the variables on which measurements are available. Let the covariance matrix associated with *X* be *Σ*, that is, Cov*(X)* = *Σ*. Since linear functions also contain individual variables, we may consider linear functions such as *u* = *a*1*x*<sup>1</sup> +···+ *apxp* = *A*- *X* = *X*- *A*, a prime designating a transpose, where

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ \vdots \\ x\_p \end{bmatrix}, \ A = \begin{bmatrix} a\_1 \\ a\_2 \\ \vdots \\ a\_p \end{bmatrix} \text{ and } \ \Sigma = (\sigma\_{ij}) = \begin{bmatrix} \sigma\_{11} & \sigma\_{12} & \dots & \sigma\_{1p} \\ \sigma\_{21} & \sigma\_{22} & \dots & \sigma\_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma\_{p1} & \sigma\_{p2} & \dots & \sigma\_{pp} \end{bmatrix}. \tag{9.1.1}$$

Then,

$$\text{Var}(\mu) = \text{Var}(A^\prime X) = \text{Var}(X^\prime A) = A^\prime \Sigma A. \tag{9.1.2}$$

#### **9.2. Principal Components**

As will be explained further, the central objective in connection with the derivation of principal components consists of maximizing *A*- *ΣA*. Such an exercise would indeed prove meaningless unless some constraint is imposed on A, considering that, for an arbitrary vector *A*, the minimum of *A*- *ΣA* occurs at zero and the maximum, at +∞, *Σ* = *E*[*X* − *E(X)*][*X* − *E(X)*] being either positive definite or positive semi-definite. Since *Σ* is symmetric and non-negative definite, its eigenvalues, denoted by *λ*<sup>1</sup> ≥ *λ*<sup>2</sup> ≥···≥ *λp* ≥ 0, are real. Moreover, *Σ* being symmetric, there exists an orthonormal matrix *P, PP*- = *I, P*- *P* = *I,* such that

$$P'\Sigma P = \text{diag}(\lambda\_1, \dots, \lambda\_p) \equiv A = \begin{bmatrix} \lambda\_1 & 0 & \cdots & 0 \\ 0 & \lambda\_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda\_p \end{bmatrix} \tag{9.2.1}$$

Principal Component Analysis

and

$$
\Sigma = P \, \Lambda P' = \lambda\_1 P\_1 P\_1' + \dots + \lambda\_p P\_p P\_p' \, , \tag{9.2.2}
$$

where *P*1*,...,Pp* constitute the columns of *P*, *Pi* denoting a normalized eigenvector corresponding to *λi, i* = 1*,...,p*; this is expounded for instance in Mathai and Haubold (2017a). Note that all real symmetric matrices, including those having repeated eigenvalues, can be diagonalized. Since the optimization problem is pointless when *A* is arbitrary, the search for an optimum shall be confined to vectors *A* such that *A*- *A* = 1, that is, vectors lying on the unit sphere in *p*. Without any loss of generality, the coefficients of the linear function can be selected so that the Euclidean norm of the coefficient vector is unity, in which case a minimum and a maximum will both exist. Hence, the problem can be restated as follows:

$$\text{Maximize } A^\prime \Sigma A \text{ subject to } A^\prime A = 1. \tag{i}$$

We will resort to the method of Lagrangian multipliers to optimize *A*- *ΣA* subject to the constraint *A*- *A* = 1. Let

$$\phi\_{\rm l} = A^{'} \Sigma A - \lambda (A^{'} A - 1) \tag{ii}$$

where *λ* is a Lagrangian multiplier. Differentiating *φ*<sup>1</sup> with respect to *A* and equating the result to a null vector (vector/matrix derivatives are discussed in Chap. 1, as well as in Mathai 1997), we have the following:

$$\frac{\partial \phi\_1}{\partial A} = O \Rightarrow 2\Sigma A - 2\lambda A = O \Rightarrow \Sigma A = \lambda A. \tag{iii}$$

On premultiplying *(iii)* by *A*- , we have

$$A^\prime \Sigma A = \lambda A^\prime A = \lambda. \tag{9.2.3}$$

In order to obtain a non-null solution for *A* in *(iii)*, the coefficient matrix *Σ* − *λI* has to be singular or, equivalently, its determinant has to be zero, that is, |*Σ* − *λI* | = 0, which implies that *λ* is an eigenvalue of *Σ*, *A* being the corresponding eigenvector. Thus, it follows from (9.2.3) that the maximum of the quadratic form *A*- *ΣA*, subject to *A*- *A* = 1, is the largest eigenvalue of *Σ*:

$$\max\_{A'A=\mathbb{I}} \left[ A'\Sigma A \right] = \lambda\_{\mathbb{I}} = \text{ the largest eigenvalue of } \Sigma.$$

Similarly,

$$\min\_{A'A=1} \left[ A' \Sigma A \right] = \lambda\_p = \text{ the smallest eigenvalue of } \Sigma. \tag{9.2.4}$$

Since Var*(A*- *X)* = *A*- *ΣA* = *λ*, the largest variance associated with a linear combination *A*- *X* wherein the vector *A* is normalized, is equal to *λ*1, the largest eigenvalue of *Σ*, and letting *A*<sup>1</sup> be the normalized (*A*- <sup>1</sup>*A*<sup>1</sup> = 1) eigenvector corresponding *λ*1, *u*<sup>1</sup> = *A*- 1*X* will be that linear combination of *X* having the maximum variance. Thus, *u*<sup>1</sup> is called the first principal component which is the linear function of *X* having the maximum variance. Although normalized, the vector *A*<sup>1</sup> is not unique, as *(*−*A*1*)*- *(*−*A*1*)* is also equal to one. In order to ensure the unicity, we will require that the first nonzero element of *A*1 and the other *p* − 1 normalized eigenvectors—be positive. Recall that since *Σ* is a real symmetric matrix, the *λj* 's are real and so are the corresponding eigenvectors. Consider the second largest eigenvalue *λ*<sup>2</sup> and determine the associated normalized eigenvector *A*2; then *u*<sup>2</sup> = *A*- <sup>2</sup>*X* will be the second principal component. Since the matrix *P* that diagonalizes *Σ* into the diagonal matrix of its eigenvalues is orthonormal, the normalized eigenvectors *A*1*, A*2*,...,Ap* are necessarily orthogonal to each other, which means that the corresponding principal components *u*<sup>1</sup> = *A*- <sup>1</sup>*X, u*<sup>2</sup> = *A*- <sup>2</sup>*X, . . . , up* = *A*- *pX* will be uncorrelated. Let us see whether uncorrelated normalized eigenvectors could be obtained by making use of the above procedure. When constructing *A*2, we can impose an additional condition to the effect that *A*- *X* should be uncorrelated with *A*- <sup>1</sup>*X*, *A*- <sup>1</sup>*ΣA*<sup>1</sup> being equal to *λ*1*,* the largest eigenvalue of *Σ*. The covariance between *A*- *X* and *A*- <sup>1</sup>*X* is

$$\text{Cov}(A^\prime X, \ A\_\text{l}^\prime X) = A^\prime \text{Cov}(X) A\_\text{l} = A^\prime \Sigma A\_\text{l} = A\_\text{l}^\prime \Sigma A. \tag{9.2.5}$$

Hence, we may require that *A*- *ΣA*<sup>1</sup> = *A*- <sup>1</sup>*ΣA* = 0. However,

$$0 = A^\prime \Sigma A\_\! = A^\prime (\Sigma A\_\! ) = A^\prime \lambda\_1 A\_\! = \lambda\_1 A^\prime A\_\! \Rightarrow A^\prime A\_\! = 0. \tag{\text{iv}}$$

Observe that *λ*<sup>1</sup> *>* 0, noting that *Σ* would be a null matrix if its largest eigenvalue were equal to zero, in which case no optimization problem would remain to be solved. Consider

$$\phi\_2 = A' \Sigma A - 2\mu\_1 (A' \Sigma A\_1 - 0) - \mu\_2 (A'A - 1) \tag{\text{v}}$$

where *μ*<sup>1</sup> and *μ*<sup>2</sup> are the Lagrangian multipliers. Now, differentiating *φ*<sup>2</sup> with respect to *A* and equating the result to a null vector, we have the following:

$$\frac{\partial \phi\_2}{\partial A} = O \Rightarrow 2\Sigma A - 2\mu\_1(\Sigma A\_1) - 2\mu\_2 A = O. \tag{\text{vì}}$$

Premultiplying *(vi)* by *A*- <sup>1</sup> yields

$$A\_1' \Sigma A - \mu\_1 A\_1' \Sigma A\_1 - \mu\_2 A\_1' A = 0 \Rightarrow 0 - \mu\_1 \lambda\_1 - 0 \Rightarrow \mu\_1 = 0,\tag{\text{vii}}$$

which entails that the added condition of uncorrelatedness with *u*<sup>1</sup> = *A*- <sup>1</sup>*X* is superfluous. When determining *Aj* , we could require that *A*- *X* be uncorrelated with the principal components *u*<sup>1</sup> = *A*- <sup>1</sup>*X, . . . , uj*−<sup>1</sup> = *A*- *<sup>j</sup>*−1*X*; however, as it turns out, these conditions become redundant when optimizing *A*- *ΣA* subject to *A*- *A* = 1. Thus, after determining the normalized eigenvector *Aj* (whose first nonzero element is positive) corresponding the *j* -th largest eigenvalue of *Σ*, we form the *j* -th principal component, *uj* = *A*- *<sup>j</sup>X*, which will necessarily be uncorrelated with the preceding principal components. The question that arises at this juncture is whether all of *u*1*,...,up* are needed or a subset thereof would suffice? For instance, we could interrupt the computations when the variance of the principal component *uj* = *A*- *<sup>j</sup>X*, namely *λj ,* falls below a predetermined threshold, in which case we can regard the remaining principal components, *uj*+<sup>1</sup>*,...,up*, as unimportant and omit the associated calculations. In this way, a reduction in the number of variables is achieved as the original number of variables *p* is reduced to *j<p* principal components. However, this reduction in the number of variables could be viewed as a compromise since the new variables are linear functions of all the original ones and so, may not be as interpretable in a real-life situation. Other drawbacks will be considered in the next section. Observe that since Var*(uj )* = *λj , j* = 1*,... , p,* the fraction

$$\nu\_{\mathbf{l}} = \frac{\lambda\_{\mathbf{l}}}{\sum\_{j=1}^{p} \lambda\_{j}} = \text{ the proportion of the total variation accounted for by } \mu\_{\mathbf{l}}, \quad (9.2.6)$$

and letting *r < p,*

$$\nu\_r = \frac{\sum\_{j=1}^r \lambda\_j}{\sum\_{j=1}^p \lambda\_j} = \text{ the proportion of total variation accounted for by } u\_1, \dots, u\_r. \tag{9.2.7}$$

If *ν*<sup>1</sup> = 0*.*7*, ν*<sup>1</sup> accounts for 70% of the total variation in the original variables or 70% of the total variation is due to the first principal component. If *r* = 3 and *ν*<sup>3</sup> = 0*.*99*,* then the sum of the first three principal components accounts for 99% of the total variation. We can also use this percentage of the total variation as a stopping rule for the determination of the principal components. For example when *νr* of (9.2.7) is say, greater than or equal to 95%, we may interrupt the determination of the principal components beyond *ur*.

**Example 9.2.1.** Even though reducing the number of variables is a main objective of Principal Component Analysis, for illustrative purposes, we will consider a case involving three variables, that is, *p* = 3. Compute the principal components associated with the following covariance matrix:

$$V = \begin{bmatrix} 3 & -1 & 0 \\ -1 & 3 & 1 \\ 0 & 1 & 3 \end{bmatrix}.$$

**Solution 9.2.1.** Let us verify that *V* is positive definite. The leading minors of *V* = *V* - being

$$|\mathfrak{J}(3)| = 3 > 0, \quad \begin{vmatrix} 3 & -1 \\ -1 & 3 \end{vmatrix} = 3^2 - (-1)^2 = 8 > 0, \quad \begin{vmatrix} 3 & -1 & 0 \\ -1 & 3 & 1 \\ 0 & 1 & 3 \end{vmatrix}$$

$$= 3[3^2 - (-1)^2] - (-1)[(-1)(3) - 0] = 21 > 0,$$

*V >O*. Let us compute the eigenvalues of *V* . Consider the equation |*V* − *λI* | = 0 ⇒ *(*3−*λ)*[*(*3−*λ)*2−1]−*(*3−*λ)* <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *(*3−*λ)*[*(*3−*λ)*2−2] = <sup>0</sup> <sup>⇒</sup> *(*3−*λ)(*3−*λ*±√2*)* <sup>=</sup> 0. Hence the eigenvalues are *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>3</sup> <sup>+</sup> <sup>√</sup>2*, λ*<sup>2</sup> <sup>=</sup> <sup>3</sup>*, λ*<sup>3</sup> <sup>=</sup> <sup>3</sup> <sup>−</sup> <sup>√</sup>2. Let us compute an eigenvector corresponding to *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>3</sup> <sup>+</sup> <sup>√</sup>2. Consider *(V* <sup>−</sup> *<sup>λ</sup>*1*I )X* <sup>=</sup> *<sup>O</sup>*, that is,

$$
\begin{bmatrix}
\ 3 - \lambda\_1 & -1 & 0 \\
0 & 1 & \ 3 - \lambda\_1
\end{bmatrix}
\begin{bmatrix}
x\_1 \\ x\_2 \\ x\_3
\end{bmatrix} = 
\begin{bmatrix}
0 \\ 0 \\ 0
\end{bmatrix}.
\tag{i}
$$

There are three linear equations involving the *xj* 's in *(i)*. Since the matrix *V* − *λI* is singular, we need only consider any two of these three linear equations and solve to obtain a solution. Let the first equation be − <sup>√</sup>2*x*<sup>1</sup> <sup>−</sup>*x*<sup>2</sup> <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *<sup>x</sup>*<sup>2</sup> = −√2*x*1, and the second one be <sup>−</sup>*x*<sup>1</sup> <sup>−</sup> <sup>√</sup>2*x*<sup>2</sup> <sup>+</sup> *<sup>x</sup>*<sup>3</sup> <sup>=</sup> <sup>0</sup> ⇒ −*x*<sup>1</sup> <sup>+</sup> <sup>2</sup>*x*<sup>1</sup> <sup>+</sup> *<sup>x</sup>*<sup>3</sup> <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *<sup>x</sup>*<sup>3</sup> = −*x*1. Now, one solution is

$$X\_{\mathbf{l}} = \begin{bmatrix} 1 \\ -\sqrt{2} \\ -1 \end{bmatrix}, \text{ which once normalized is } A\_{\mathbf{l}} = \frac{1}{2} \begin{bmatrix} 1 \\ -\sqrt{2} \\ -1 \end{bmatrix}.$$

Observe that *X*<sup>1</sup> also satisfies the third equation in *(i)* and that

$$\begin{split} A\_1' VA\_1 &= \frac{1}{4} [1, -\sqrt{2}, -1] \begin{bmatrix} 3 & -1 & 0 \\ -1 & 3 & 1 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} 1 \\ -\sqrt{2} \\ -1 \end{bmatrix} \\ &= \frac{1}{4} [3(1)^2 + 3(-\sqrt{2})^2 + 3(-1)^2 - 2(1)(-\sqrt{2}) + 2(-\sqrt{2})(-1)] \\ &= 3 + \sqrt{2} = \lambda\_1. \end{split}$$

Thus, the first principal component, denoted by *u*1, is

$$
\mu\_1 = A\_1' X = \frac{1}{2} [\mathbf{x}\_1 - \sqrt{2}\mathbf{x}\_2 - \mathbf{x}\_3]. \tag{ii}
$$

Now, consider an eigenvector corresponding to *λ*<sup>2</sup> = 3. *(V* − *λ*2*I )X* = *O* provides the first equation: −*x*<sup>2</sup> = 0 ⇒ *x*<sup>2</sup> = 0, the third equation giving −*x*<sup>1</sup> + *x*<sup>3</sup> = 0 ⇒ *x*<sup>1</sup> = *x*3. Therefore, one solution for *X* is

$$X\_2 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} \text{ whose normalized form is } A\_2 = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}.$$

so that the second principal component is

$$
\mu\_2 = \frac{1}{\sqrt{2}} [\mathbf{x}\_1 + \mathbf{x}\_3]. \tag{iiii}
$$

It can readily be checked that *A*- <sup>2</sup>*V A*<sup>2</sup> = 3 = *λ*2. Let us now obtain an eigenvector corresponding to the eigenvalue *<sup>λ</sup>*<sup>3</sup> <sup>=</sup> <sup>3</sup>−√2. Consider the linear system *(V* <sup>−</sup>*λ*3*)X* <sup>=</sup> *<sup>O</sup>*. The first equation is <sup>√</sup>2*x*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *<sup>x</sup>*<sup>2</sup> <sup>=</sup> <sup>√</sup>2*x*1, and the third one gives *<sup>x</sup>*<sup>2</sup> <sup>+</sup> <sup>√</sup>2*x*<sup>3</sup> <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *<sup>x</sup>*<sup>2</sup> = −√2*x*3. One solution is

$$X\_3 = \begin{bmatrix} 1 \\ \sqrt{2} \\ -1 \end{bmatrix}, \text{ its normalized form being } A\_3 = \frac{1}{2} \begin{bmatrix} 1 \\ \sqrt{2} \\ -1 \end{bmatrix}.$$

The third principal component is then

$$
\mu\_3 = \frac{1}{2} [\mathbf{x}\_1 + \sqrt{2}\mathbf{x}\_2 - \mathbf{x}\_3]. \tag{i\nu}
$$

It is easily verified that *A*- <sup>3</sup>*V A*<sup>3</sup> <sup>=</sup> *<sup>λ</sup>*<sup>3</sup> <sup>=</sup> <sup>3</sup> <sup>−</sup> <sup>√</sup>2. As well,

$$\begin{split} \text{Var}(\boldsymbol{\mu}\_{1}) &= \frac{1}{4} [\text{Var}(\mathbf{x}\_{1}) + 2\,\text{Var}(\mathbf{x}\_{2}) + \text{Var}(\mathbf{x}\_{3}) - 2\sqrt{2}\,\text{Cov}(\mathbf{x}\_{1}, \mathbf{x}\_{2})] \\ &- 2\,\text{Cov}(\mathbf{x}\_{1}, \mathbf{x}\_{3}) + 2\sqrt{2}\,\text{Cov}(\mathbf{x}\_{2}, \mathbf{x}\_{3})] \\ &= \frac{1}{4} [3 + 2 \times 3 + 3 + 2\sqrt{2}(-1) - 2(0) + 2\sqrt{2}] \\ &= \frac{1}{4} [3 \times 4 + 4\sqrt{2}] = 3 + \sqrt{2} = \lambda\_{1}. \end{split}$$

Similar calculations will confirm that Var*(u*2*)* <sup>=</sup> <sup>3</sup> <sup>=</sup> *<sup>λ</sup>*<sup>2</sup> and Var*(u*3*)* <sup>=</sup> <sup>2</sup> <sup>−</sup> <sup>√</sup><sup>2</sup> <sup>=</sup> *<sup>λ</sup>*3. Now, consider the covariance between *u*<sup>1</sup> and *u*2:

$$\begin{split} \text{Cov}(\boldsymbol{\mu}\_{1}, \boldsymbol{\mu}\_{2}) &= \frac{1}{\sqrt{2}} [\text{Var}(\mathbf{x}\_{1}) + \text{Cov}(\mathbf{x}\_{1}, \mathbf{x}\_{3}) - \sqrt{2} \,\text{Cov}(\mathbf{x}\_{1}, \mathbf{x}\_{2}) - \sqrt{2} \,\text{Cov}(\mathbf{x}\_{2}, \mathbf{x}\_{3})] \\ &- \text{Cov}(\mathbf{x}\_{3}, \mathbf{x}\_{1}) - \text{Var}(\mathbf{x}\_{3})] \\ &= \frac{1}{4} [3 + 0 + \sqrt{2} - \sqrt{2} - 0 - 0 - 3] = 0. \end{split}$$

It can be likewise verified that Cov*(u*1*, u*3*)* = 0 and Cov*(u*2*, u*3*)* = 0. Note that *u*1*, u*2*,* and *u*<sup>3</sup> respectively account for 49.1%, 33.3% and 17.6% of the total variation. As none of these proportions is negligibly small, all three principal components are deemed relevant and, in this instance, it is not indicated to reduce the number of the original variables. Although we still end up with as many variables, the *uj* 's, *j* = 1*,* 2*,* 3*,* are uncorrelated, which was not the case for the original variables.

#### **9.3. Issues to Be Mindful of when Constructing Principal Components**

Since variances and covariances are expressed in units of measurement, principal components also depend upon the scale on which measurements on the individual variables are made. If we change the units of measurement, the principal components will differ. Suppose that *xj* is multiplied by a real scalar constant *dj , j* = 1*,...,p*, where some of the *dj* 's are not equal to 1. This is equivalent to changing the units of measurement of some of the variables. Let

$$X = \begin{bmatrix} x\_1 \\ \vdots \\ x\_p \end{bmatrix}, \ Y = \begin{bmatrix} d\_1 x\_1 \\ \vdots \\ d\_p x\_p \end{bmatrix} = DX, \ D = \begin{bmatrix} d\_1 & 0 & \cdots & 0 \\ 0 & d\_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d\_p \end{bmatrix},\tag{9.3.1}$$

and consider the linear functions *A*- *X* and *A*- *Y* . Then,

$$\text{Var}(A^\prime X) = A^\prime \Sigma A,\ \text{Var}(A^\prime Y) = A^\prime \text{Var}(DX)A = A^\prime D \Sigma DA. \tag{9.3.2}$$

Since the eigenvalues of *Σ* and *DΣD* differ, so will the corresponding principal components, and if the original variables are measured in various units of measurements, it would be advisable to attempt to standardize them. Letting *R* denote the correlation matrix which is scale-invariant, observe that the covariance matrix *Σ* = *Σ*1*R Σ*<sup>1</sup> where *<sup>Σ</sup>*<sup>1</sup> <sup>=</sup> diag*(σ*1*,...,σp)*, *<sup>σ</sup>*<sup>2</sup> *<sup>j</sup>* being the variance of *xj , j* = 1*,...,p*. Thus, if the original variables are scaled by the inverses of their associated standard deviations, that is, *xj σj* or equivalently, via the transformation *<sup>Σ</sup>*−<sup>1</sup> <sup>1</sup> *X*, the resulting covariance matrix is Cov*(Σ*−<sup>1</sup> <sup>1</sup> *X)* = *R,* the correlation matrix. Accordingly, constructing the principal components by making use of *R* instead of *Σ*, will mitigate the issue stemming from the scale of measurement.

If *λ*1*,...,λp* are the eigenvalues of *Σ*, then *λ<sup>k</sup>* 1*,...,λ<sup>k</sup> <sup>p</sup>* will be the eigenvalues of *Σk*. Moreover, *λ* and *λ<sup>k</sup>* will share the same eigenvectors. Note that the collection *(λ<sup>k</sup>* 1*,...,λ<sup>k</sup> p)* will be well separated compared to the set *(λ*1*,...,λp)* when the *λi*'s are distinct and greater than one. Hence, in some instances, it might be preferable to construct principal components by making use of *Σ<sup>k</sup>* for an appropriate value of *k* instead of *Σ*. Observe that, in certain situations, it is not feasible to provide a physical interpretation of the principal components which are a linear function of the original *x*1*,...,xp*. Nonetheless, they can at times be informative by pointing to the average of certain variables (for example, *u*<sup>2</sup> in the previous numerical example) or by eliciting contrasts between two sets of variables (for example, *u*<sup>3</sup> in the previous numerical example, which opposes *x*<sup>3</sup> to *x*<sup>1</sup> and *x*2).

To illustrate how the eigenvalues of a correlation matrix are determined, we revisit Example 9.2.1 wherein the variances of the *xj* 's or the diagonal elements of the covariance matrix are all equal to 3. Thus, in this case, the correlation matrix is *<sup>R</sup>* <sup>=</sup> <sup>1</sup> <sup>3</sup>*V* , and the eigenvalues of *R* will be <sup>1</sup> <sup>3</sup> times the eigenvalues of *V* . However, the normalized eigenvectors will remain identical to those of *V* , and therefore the principal components will not change. We now consider an example wherein the diagonal elements of the covariance matrix are different.

**Example 9.3.1.** Let *X*- = *(x*1*, x*2*, x*3*)* where the *xj* 's are real scalar random variables. Compute the principal components resulting from *R*, the correlation matrix of *X*, where

$$R = \begin{bmatrix} 1 & 0 & \frac{1}{\sqrt{6}} \\ 0 & 1 & -\frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} & -\frac{1}{\sqrt{6}} & 1 \end{bmatrix}.$$

**Solution 9.3.1.** Let us compute the eigenvalues of *R*. Consider the equation

$$|R - \lambda I| = 0 \Rightarrow (1 - \lambda) \left[ (1 - \lambda)^2 - \frac{1}{6} \right] - \frac{1}{\sqrt{6}} \left[ -\frac{1}{\sqrt{6}} (1 - \lambda) \right] = 0$$

$$\Rightarrow (1 - \lambda) \left[ (1 - \lambda)^2 - \frac{1}{3} \right] = 0.$$

Thus, the eigenvalues are *λ*<sup>1</sup> = 1 + <sup>√</sup> 1 3 *, λ*<sup>2</sup> = 1*, λ*<sup>3</sup> = 1 − <sup>√</sup> 1 3 . Let us compute an eigenvector corresponding to *λ*<sup>1</sup> = 1 + <sup>√</sup> 1 3 . Consider the system *(R* − *λ*1*I )X* = *O*. The first equation is

$$-\frac{1}{\sqrt{3}}x\_1 + \frac{1}{\sqrt{6}}x\_3 = 0 \Rightarrow x\_1 = \frac{1}{\sqrt{2}}x\_3,$$

the second equation being

$$-\frac{1}{\sqrt{3}}x\_2 - \frac{1}{\sqrt{6}}x\_3 = 0 \Rightarrow x\_2 = -\frac{1}{\sqrt{2}}x\_3.$$

Let us take *<sup>x</sup>*<sup>3</sup> <sup>=</sup> <sup>√</sup>2. Then, an eigenvector corresponding to *<sup>λ</sup>*<sup>1</sup> is

$$X\_1 = \begin{bmatrix} 1 \\ -1 \\ \sqrt{2} \end{bmatrix}, \text{ and its normalized form is } A\_1 = \frac{1}{2} \begin{bmatrix} 1 \\ -1 \\ \sqrt{2} \end{bmatrix}.$$

Let us verify that the third equation is also satisfied by *X*<sup>1</sup> or *A*1. This is the case since √ 1 <sup>6</sup> + <sup>√</sup> 1 <sup>6</sup> − √ √ 2 <sup>3</sup> <sup>=</sup> 0, and the first principal component is indeed *<sup>u</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> [*x*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> <sup>+</sup> <sup>√</sup>2*x*3]. As well, the variance of *u*<sup>1</sup> is equal to *λ*1:

$$\begin{split} \text{Var}(\boldsymbol{\mu}\_{1}) &= \frac{1}{4} [\text{Var}(\mathbf{x}\_{1}) + \text{Var}(\mathbf{x}\_{2}) + 2\text{Var}(\mathbf{x}\_{3}) - 2\text{Cov}(\mathbf{x}\_{1}, \mathbf{x}\_{2})] \\ &+ 2\sqrt{2}\text{Cov}(\mathbf{x}\_{1}, \mathbf{x}\_{3}) - 2\sqrt{2}\text{Cov}(\mathbf{x}\_{2}, \mathbf{x}\_{3})] \\ &= \frac{1}{4} \Big[ 1 + 1 + 2 + \frac{2\sqrt{2}}{\sqrt{6}} + \frac{2\sqrt{2}}{\sqrt{6}} \Big] = 1 + \frac{1}{\sqrt{3}} = \lambda\_{\text{l.}} \end{split}$$

Let us compute an eigenvector corresponding to the eigenvalue *λ*<sup>2</sup> = 1. Consider the linear system *(R* − *λ*2*I )X* = *O*. The first equation, <sup>√</sup> 1 6 *x*<sup>3</sup> = 0*,* gives *x*<sup>3</sup> = 0 and the second one also yields *x*<sup>3</sup> = 0; as for the third one, <sup>√</sup> *x*1 <sup>6</sup> − <sup>√</sup> *x*2 <sup>6</sup> <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *<sup>x</sup>*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*2. Letting *<sup>x</sup>*<sup>1</sup> <sup>=</sup> 1, an eigenvector is

$$X\_2 = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}, \text{ its normalized form being given by } A\_2 = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}.$$

Thus, the second principal component is *u*<sup>2</sup> = <sup>√</sup> 1 2 [*x*<sup>1</sup> + *x*2]. In the case of *λ*3, we consider the system *(R* − *λ*3*I )X* = *O* whose first equation is

$$
\frac{1}{\sqrt{3}}x\_1 + \frac{1}{\sqrt{6}}x\_3 = 0 \Rightarrow x\_1 = -\frac{1}{\sqrt{2}}x\_3,
$$

the second one being

$$
\frac{\chi\_2}{\sqrt{3}} + \frac{\chi\_3}{\sqrt{6}} = 0 \Rightarrow \chi\_2 = \frac{1}{\sqrt{2}}\chi\_3.
$$

Letting *<sup>x</sup>*<sup>3</sup> <sup>=</sup> <sup>√</sup>2, an eigenvector corresponding to *<sup>λ</sup>*<sup>3</sup> is

$$X\_3 = \begin{bmatrix} -1 \\ 1 \\ \sqrt{2} \end{bmatrix}, \text{ its normalized form being } A\_3 = \frac{1}{2} \begin{bmatrix} -1 \\ 1 \\ \sqrt{2} \end{bmatrix}.$$

For the matrix [*A*1*, A*2*, A*3] to be uniquely determined, we may multiply *A*<sup>3</sup> by −1 so that the first nonzero element is positive. Hence, the third principal component is *<sup>u</sup>*<sup>3</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> [*x*<sup>1</sup> − *<sup>x</sup>*<sup>2</sup> <sup>−</sup> <sup>√</sup>2*x*3]. As was shown in the case of *<sup>u</sup>*1, it can also be verified that Var*(u*2*)* <sup>=</sup> <sup>1</sup> <sup>=</sup> *λ*2*,* Var*(u*3*)* = *λ*<sup>3</sup> = 1 − <sup>√</sup> 1 3 , Cov*(u*1*, u*2*)* = 0*,* Cov*(u*1*, u*3*)* = 0 and Cov*(u*2*, u*3*)* = 0. We note that

$$\frac{\lambda\_1}{\lambda\_1 + \lambda\_2 + \lambda\_3} = \frac{1 + 1/\sqrt{3}}{3} \approx 0.525,$$

$$\frac{\lambda\_1 + \lambda\_2}{\lambda\_1 + \lambda\_2 + \lambda\_3} = \frac{2 + 1/\sqrt{3}}{3} \approx 0.859.$$

Thus, almost 53% of the total variation is accounted for by the first principal component and nearly 86% of the total variation is due to the first two principal components.

#### **9.4. The Vector of Principal Components**

Observe that the determinant of a matrix is the product of its eigenvalues and that its trace is the sum of its eigenvalues, that is, |*Σ*| = *λ*<sup>1</sup> ··· *λp* and tr*(Σ)* = *λ*<sup>1</sup> +···+ *λp*. As previously pointed out, the determinant of a covariance matrix corresponds to Wilks' concept of *generalized variance*. Let us consider the vector of principal components. The principal components are *uj* = *A*- *<sup>j</sup>X,* with Var*(uj )* = *A*- *<sup>j</sup>ΣAj* = *λj , j* = 1*,... , p,* and Cov*(ui, uj )* = 0 for all *i* = *j* . Thus,

$$U = \begin{bmatrix} u\_1 \\ \vdots \\ u\_p \end{bmatrix} = \begin{bmatrix} A\_1'X \\ \vdots \\ A\_p'X \end{bmatrix} \Rightarrow \text{Cov}(U) = A = \begin{bmatrix} \lambda\_1 & 0 & \cdots & 0 \\ 0 & \lambda\_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda\_p \end{bmatrix}.\tag{9.4.1}$$

The determinant of the covariance matrix is the product *λ*<sup>1</sup> ··· *λp* and its trace, the sum *λ*<sup>1</sup> +···+ *λp*. Hence the following result:

**Theorem 9.4.1.** *Let X be a p* × 1 *real vector whose associated covariance matrix is Σ. Let the principal components of X be denoted by uj* = *A*- *<sup>j</sup>X with* Var*(uj )* = *A*- *<sup>j</sup>ΣAj* = *λj* = *j -th largest eigenvalue of Σ, and U*- = *(u*1*,...,up) with* Cov*(U )* ≡ *Σu. Then,* |*Σu*|=|*Σ*| = *product of the eigenvalues, λ*<sup>1</sup> ··· *λp, and* tr*(Σu)* = tr*(Σ)* = *sum of the eigenvalues, λ*<sup>1</sup> +···+ *λp. Observe that the determinant as well as the eigenvalues and the trace are invariant with respect to orthonormal transformations or rotations of the coordinate axes.*

Let *A* = [*A*1*, A*2*,...,Ap*]. Then

$$
\Sigma A = A\Lambda,\ A'A = I,\ A'\Sigma A = \Lambda.\tag{9.4.2}
$$

Note that *U*- = *(u*1*,...,up)* where *u*1*,...,up* are linearly independent. We have not assumed any distribution for the *p* × 1 vector *X* so far. If the *p* × 1 vector *X* has a *p*variate nonsingular Gaussian distribution, that is, *X* ∼ *Np(O, Σ), Σ > O*, then

$$
\mu\_j \sim N\_1(0, \,\lambda\_j), \,\, U \sim N\_p(O, \,\,\Lambda). \tag{9.4.3}
$$

Since Principal Components involve variances or covariances, which are free of any location parameter, we may take the mean value vector to be a null vector without any loss of generality. Then, *E*[*uj* ] = 0 by assumption, Var*(uj )* = *A*- *<sup>j</sup>ΣAj* = *λj , j* = 1*,...,p* and Cov*(U )* <sup>=</sup> *<sup>Λ</sup>* <sup>=</sup> diag*(λ*1*,...,λp)*. Accordingly, *<sup>λ</sup>*<sup>1</sup> *<sup>λ</sup>*1+···+*λp* is the proportion of the total variance accounted for by the largest eigenvalue *λ*1, where *λ*<sup>1</sup> is equal to the variance of the first principal component. Similarly, *<sup>λ</sup>*1+···+*λr <sup>λ</sup>*1+···+*λp* is the proportion of the total variance due to the first *r* principal components, *r* ≤ *p*.

**Example 9.4.1.** Let *X* ∼ *N*3*(μ, Σ), Σ > O*, where

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \; \mu = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \; \Sigma = \begin{bmatrix} 2 & 0 & 1 \\ 0 & 2 & -1 \\ 1 & -1 & 3 \end{bmatrix}.$$

Derive the densities of the principal components of *X*.

**Solution 9.4.1.** Let us determine the eigenvalues of *Σ*. Consider the equation

$$\begin{aligned} |E - \lambda I| &= 0 \Rightarrow \begin{vmatrix} 2 - \lambda & 0 & 1 \\ 0 & 2 - \lambda & -1 \\ 1 & -1 & 3 - \lambda \end{vmatrix} &= 0 \\ &\Rightarrow (2 - \lambda)[(2 - \lambda)(3 - \lambda) - 1] - (2 - \lambda) = 0 \\ &\Rightarrow (2 - \lambda)[\lambda^2 - 5\lambda + 4] = 0 \Rightarrow \lambda\_1 = 4, \ \lambda\_2 = 2, \ \lambda\_3 = 1. \end{aligned}$$

Let us compute an eigenvector corresponding to *λ*<sup>1</sup> = 4. Consider the linear system *(Σ* − *<sup>λ</sup>*1*I )X* <sup>=</sup> *<sup>O</sup>*, whose first equation gives <sup>−</sup>2*x*<sup>1</sup> <sup>+</sup> *<sup>x</sup>*<sup>3</sup> <sup>=</sup> 0 or *<sup>x</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> *x*3, the second equation, <sup>−</sup>2*x*<sup>2</sup> <sup>−</sup> *<sup>x</sup>*<sup>3</sup> <sup>=</sup> 0, yielding *<sup>x</sup>*<sup>2</sup> = −<sup>1</sup> <sup>2</sup> *x*3. Letting *x*<sup>3</sup> = 2, one solution is

$$X\_{\mathbf{l}} = \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}, \text{and its normalized form is } A\_{\mathbf{l}} = \frac{1}{\sqrt{6}} \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}.$$

Thus, the first principal component is *u*<sup>1</sup> = *A*- <sup>1</sup>*X* = <sup>√</sup> 1 6 [*x*<sup>1</sup> − *x*<sup>2</sup> + 2*x*3] with *E*[*u*1] = √ 1 6 [1 − 0 − 2]=−√ 1 <sup>6</sup> and Var*(u*1*)* <sup>=</sup> <sup>1</sup> 6 [Var*(x*1*)* + Var*(x*2*)* + 4Var*(x*3*)* − 2Cov*(x*1*, x*2*)* + 4Cov*(x*1*, x*3*)* − 4Cov*(x*2*, x*3*)*] = 1 6 [2 + 2 + 4 × 3 − 2*(*0*)* + 4*(*1*)* − 4*(*−1*)*] = 4 = *λ*1*.*

Since *u*<sup>1</sup> is a linear function of normal variables, *u*<sup>1</sup> has the following Gaussian distribution:

$$
\mu\_1 \sim N\_1 \left( -\frac{1}{\sqrt{6}}, , 4 \right).
$$

Let us compute an eigenvector corresponding to *λ*<sup>2</sup> = 2. Consider the system *(Σ* − *λ*2*I )X* = *O*, whose first and second equations give *x*<sup>3</sup> = 0, the third equation *x*<sup>1</sup> − *x*<sup>2</sup> + *x*<sup>3</sup> = 0 yielding *x*<sup>1</sup> = *x*2. Hence, one solution is

$$X\_2 = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}, \text{ its normalized form being } A\_2 = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}.$$

and the second principal component is *u*<sup>2</sup> = *A*- <sup>2</sup>*X* = <sup>√</sup> 1 2 [*x*<sup>1</sup> + *x*2] with *E*[*u*2] = <sup>√</sup> 1 2 [1 + 0] = <sup>√</sup> 1 2 . Let us verify that the variance of *u*<sup>2</sup> is *λ*<sup>2</sup> = 2:

$$\text{Var}(\mu\_2) = \frac{1}{2}[\text{Var}(\mathbf{x}\_1) + \text{Var}(\mathbf{x}\_2) + 2\text{Cov}(\mathbf{x}\_1, \mathbf{x}\_2)] = \frac{1}{2}[2 + 2 + 2(0)] = 2 = \lambda\_2.$$

Hence, *u*<sup>2</sup> has the following real univariate normal distribution:

$$
u\_2 \sim N\_1(\frac{1}{\sqrt{2}}, \, 2).$$

We finally construct an eigenvector associated with *λ*<sup>3</sup> = 1. Consider the linear system *(Σ* − *λ*3*I )X* = *O*. In this case, the first equation is *x*<sup>1</sup> + *x*<sup>3</sup> = 0 ⇒ *x*<sup>1</sup> = −*x*<sup>3</sup> and the second one is *x*<sup>2</sup> − *x*<sup>3</sup> = 0 or *x*<sup>2</sup> = *x*3. Let *x*<sup>3</sup> = 1. One eigenvector is

$$X\_3 = \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix}, \text{ its normalized form being } A\_3 = \frac{1}{\sqrt{3}} \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix}.$$

In order to ensure the uniqueness of the matrix [*A*1*, A*2*, A*3], we may multiply *A*<sup>3</sup> by −1 to ensure that the first nonzero element is positive. Then, the third principal component is *u*<sup>3</sup> = <sup>√</sup> 1 3 [*x*<sup>1</sup> − *x*<sup>2</sup> − *x*3] with *E*[*u*3] = <sup>√</sup> 1 3 [1 − 0 + 1] = <sup>√</sup> 2 3 . The variance of *u*<sup>3</sup> is indeed equal to *λ*<sup>3</sup> as

$$\begin{split} \text{Var}(\mu\_3) &= \frac{1}{3} [\text{Var}(\mathbf{x}\_1) + \text{Var}(\mathbf{x}\_2) + \text{Var}(\mathbf{x}\_3) - 2\text{Cov}(\mathbf{x}\_1, \mathbf{x}\_2)] \\ &- 2\text{Cov}(\mathbf{x}\_1, \mathbf{x}\_3) + 2\text{Cov}(\mathbf{x}\_2, \mathbf{x}\_3)] \\ &= \frac{1}{3} [2 + 2 + 3 - 2(0) - 2(1) + 2(-1)] = 1. \end{split}$$

Thus,

$$
\mu\_3 \sim N\_{\rm l} \left( \frac{2}{\sqrt{3}}, \ 1 \right).
$$

It can easily be verified that Cov*(u*1*, u*2*)* = 0*,* Cov*(u*1*, u*3*)* = 0 and Cov*(u*2*, u*3*)* = 0. Accordingly, letting

$$U = \begin{bmatrix} u\_1 \\ u\_2 \\ u\_3 \end{bmatrix} = \begin{bmatrix} A\_1'X \\ A\_2'X \\ A\_3'X \end{bmatrix},$$

$$\operatorname{Cov}(U) = \begin{bmatrix} A\_1' \Sigma A\_1 & A\_1' \Sigma A\_2 & A\_1' \Sigma A\_3 \\ A\_2' \Sigma A\_1 & A\_2' \Sigma A\_2 & A\_2' \Sigma A\_3 \\ A\_3' \Sigma A\_1 & A\_3' \Sigma A\_2 & A\_3' \Sigma A\_3 \end{bmatrix} = \begin{bmatrix} \lambda\_1 & 0 & 0 \\ 0 & \lambda\_2 & 0 \\ 0 & 0 & \lambda\_3 \end{bmatrix}.$$

As well, *ΣAj* = *λjAj , j* = 1*,* 2*,* 3, that is,

$$
\begin{bmatrix} 2 & 0 & 1 \\ 0 & 2 & -1 \\ 1 & -1 & 3 \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{3}} \\ \frac{2}{\sqrt{6}} & 0 & -\frac{1}{\sqrt{3}} \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{3}} \\ \frac{2}{\sqrt{6}} & 0 & -\frac{1}{\sqrt{3}} \end{bmatrix} \begin{bmatrix} 4 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{bmatrix}.
$$

This completes the computations.

#### **9.4.1. Principal components viewed from differing perspectives**

Let *X* be a *p* × 1 real vector whose covariance matrix is Cov*(X)* = *Σ>O*. Assume that *E(X)* = *O*. Then, *X*- *<sup>Σ</sup>*−1*<sup>X</sup>* <sup>=</sup> *c >* 0 is commonly referred to as an ellipsoid of concentration, centered at the origin of the coordinate system, with *X*- *X* being the square of the distance between the origin and a point *X* on the surface of this ellipsoid. A principal axis of this ellipsoid is defined when this squared distance has a stationary point.

The stationary points are determined by optimizing *X*- *X* subject to *X*- *<sup>Σ</sup>*−1*<sup>X</sup>* <sup>=</sup> *c >* 0. Consider

$$\begin{aligned} w &= X^\prime X - \lambda (X^\prime \Sigma^{-1} X - c), & \frac{\partial w}{\partial X} &= O \Rightarrow 2X - 2\lambda \Sigma^{-1} X = O \\ \Rightarrow \Sigma^{-1} X &= \frac{1}{\lambda} X \Rightarrow \Sigma X &= \lambda X & & (0) \\ \Rightarrow \dots & \vdots & \vdots & \vdots & \vdots & \vdots & \cdots \end{aligned}$$

where *λ* is a Lagrangian multiplier. It is seen from *(i)* that *λ* is an eigenvalue of *Σ* and *X* is the corresponding eigenvector. Letting *λ*<sup>1</sup> ≥ *λ*<sup>2</sup> ≥···≥ *λp >* 0 be the eigenvalues and *A*1*,...,Ap* be the corresponding eigenvectors, *A*1*,...,Ap* give the principal axes of this ellipsoid of concentration. It follows from *(i)* that

$$c = A\_j' \Sigma^{-1} A\_j = \frac{1}{\lambda\_j} A\_j' A\_j \Rightarrow \underline{A\_j' A\_j} = \lambda\_j c.$$

Thus, the length of the *j* -th principal axis is 2 *A*- *<sup>j</sup>Aj* = 2 *λj c*.

As another approach, consider a plane passing through the origin. The equation of this plane will be *β*- *X* = 0 where *β* is a *p* × 1 constant vector and *X* is the *p* × 1 vector of the coordinate system. Without any loss of generality, let *β*- *β* = 1. Let the *p* × 1 vector *Y* be a point in the Euclidean space. The distance between this point and the plane is then *β*- *Y* . Letting *Y* be a random point such that *E(Y )* = *O* and Cov*(Y )* = *Σ>O*, the expected squared distance from this point to the plane is *E*[*β*- *Y* ] <sup>2</sup> <sup>=</sup> *<sup>E</sup>*[*β*- *Y Y* - *β*] = *β*- *E(Y Y* - *)β* = *β*- *Σβ*. Accordingly, the two-dimensional planar manifold of closest fit to the point *Y* is that plane whose coefficient vector *β* is such that *β*- *Σβ* is minimized subject to *β*- *β* = 1. This, once again, leads to the eigenvalue problem encountered in principal component analysis.

#### **9.5. Sample Principal Components**

When *<sup>X</sup>* <sup>∼</sup> *Np(O, Σ), Σ > O,* the maximum likelihood estimator of *<sup>Σ</sup>* is *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *nS* where *n>p* is the sample size and *S* is the sample sum of products matrix, as defined for instance in Mathai and Haubold (2017b). In general, whether *Σ* is nonsingular or singular and *X* is *p*-variate Gaussian or not, we may take *Σ*ˆ as an estimate of *Σ*. The sample eigenvalues and eigenvectors are then available from *Σβ*ˆ = *kβ* where *β* = *O* is a *<sup>p</sup>* <sup>×</sup> 1 eigenvector corresponding to the eigenvalue *<sup>k</sup>* of *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S*. For a non-null *β*, *(Σ*ˆ − *kI )β* = *O* ⇒ *Σ*ˆ − *kI* is singular or |*Σ*ˆ − *kI* | = 0 and *k* is a solution of

$$|\ddot{\Sigma} - kI| = 0 \Rightarrow (\ddot{\Sigma} - k\_j I)B\_j = O, \quad j = 1, \ldots, p. \tag{9.5.1}$$

We may only consider the normalized eigenvectors *Bj* such that *B*- *<sup>j</sup>Bj* = 1*, j* = 1*,...,p*. If the eigenvalues of *Σ* are distinct, it can be shown that the eigenvalues of *Σ*ˆ are also distinct, *k*<sup>1</sup> *> k*<sup>2</sup> *>* ··· *> kp* almost surely. Note that even though *B*- *<sup>j</sup>Bj* = 1, *Bj* is not unique as one also has *(*−*Bj )*- *(*−*Bj )* = 1. Thus, we require that the first nonzero element of *Bj* be positive to ensure uniqueness. Since *Σ*ˆ is a real symmetric matrix, its eigenvalues *k*1*,...,kp* are real, and so are the corresponding normalized eigenvectors *B*1*,...,Bp*. As well, there exist a full set of orthonormal eigenvectors *B*1*,...,Bp, B*- *<sup>j</sup>Bj* = 1*, B*- *<sup>j</sup>Bi* = 0*, i* = *j, j* = 1*,... , p,* such that for *B* = *(B*1*,...,Bp)*,

$$B'\hat{\Sigma}B = K = \text{diag}(k\_1, \dots, k\_p), \ \hat{\Sigma}B = BK \text{ and } \hat{\Sigma} = k\_1B\_1B'\_1 + \dots + k\_pB\_pB'\_p. \tag{9.5.2}$$

Also, observe that *<sup>k</sup>*<sup>1</sup> *<sup>k</sup>*1+···+*kp* is the proportion of the total variation in the data which is accounted for by the first principal component. Similarly, *<sup>k</sup>*1+···+*kr <sup>k</sup>*1+···+*kp* is the proportion of the total variation due to the first *r* principal components, *r* ≤ *p*.

**Example 9.5.1.** Let *X* be a 3×1 vector of real scalar random variables, *X*- = [*x*1*, x*2*, x*3], with *E*[*X*] = *μ* and covariance matrix Cov*(X)* = *Σ>O* where both *μ* and *Σ* are unknown. Let the following observation vectors be a simple random sample of size 5 from this population:

$$X\_1 = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}, \ X\_2 = \begin{bmatrix} -1 \\ 1 \\ 0 \end{bmatrix}, \ X\_3 = \begin{bmatrix} 1 \\ -1 \\ -1 \end{bmatrix}, \ X\_4 = \begin{bmatrix} -1 \\ -1 \\ -1 \end{bmatrix}, \ X\_5 = \begin{bmatrix} 0 \\ 0 \\ 2 \end{bmatrix}.$$

Compute the principal components of *X* from an estimate of *Σ* that is based on those observations.

**Solution 9.5.1.** An estimate of *<sup>Σ</sup>* is *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S* where *n* is the sample size and *S* is the sample sum of products matrix. In this case, *n* = 5. We first compute *S*. To this end, we determine the sample average vector, the sample matrix, the deviation matrix and finally *<sup>S</sup>*. The sample average is *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>*[*X*<sup>1</sup> + *X*<sup>2</sup> + *X*<sup>3</sup> + *X*<sup>4</sup> + *X*5] and the sample matrix is **X** = [*X*1*, X*2*, X*3*, X*4*, X*5]. The matrix of sample means is **X**¯ = [*X,* ¯ *X,* ¯ *X,* ¯ *X,* ¯ *X*¯ ]. The deviation matrix is **X***<sup>d</sup>* = **X** − **X**¯ and the sample sum of products matrix *S* = **X***d***X**- *d* = [**X** − **X**¯ ][**X** − **X**¯ ] - . Based on the given random sample, these quantities are

$$\begin{split} \bar{X} &= \frac{1}{5} \left\{ \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} + \begin{bmatrix} -1 \\ 1 \\ 0 \end{bmatrix} + \begin{bmatrix} 1 \\ -1 \\ -1 \end{bmatrix} + \begin{bmatrix} -1 \\ -1 \\ -1 \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \\ 2 \end{bmatrix} \right\} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \\ \mathbf{X} &= [X\_1, \ X\_2, \ X\_3, \ X\_4, \ X\_5] = \begin{bmatrix} 1 & -1 & 1 & -1 & 0 \\ 1 & 1 & -1 & -1 & 0 \\ 0 & 0 & -1 & -1 & 2 \end{bmatrix}, \\ \mathbf{X}\_d &= [X\_1 - \bar{X}, \ X\_2 - \bar{X}, \ X\_3 - \bar{X}, \ X\_4 - \bar{X}, \ X\_5 - \bar{X}] \\ &= \begin{bmatrix} 1 & -1 & 1 & -1 & 0 \\ 1 & 1 & -1 & -1 & 0 \\ 0 & 0 & -1 & -1 & 2 \end{bmatrix}, \\ \mathbf{S} &= \begin{bmatrix} 1 & -1 & 1 & -1 & 0 \\ 1 & 1 & -1 & -1 & 0 \\ 0 & 0 & -1 & -1 & 2 \end{bmatrix} \begin{bmatrix} 1 & -1 & 1 & -1 & 0 \\ 1 & 1 & -1 & -1 & 0 \\ 0 & 0 & -1 & -1 & 2 \end{bmatrix}' \\ &= \begin{bmatrix} 4 & 0 & 0 \\ 0 & 4 & 2 \\ 0 & 2 & 6 \end{bmatrix}. \end{split}$$

An estimate of *<sup>Σ</sup>* is *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup><sup>S</sup>* <sup>=</sup> <sup>1</sup> <sup>5</sup>*S*. However, since the eigenvalues of *Σ*ˆ are those of *S* multiplied by <sup>1</sup> <sup>5</sup> and the normalized eigenvectors of *Σ*ˆ and *S* will then be identical, we will work with *S*. The eigenvalues of *S* are available from the equation |*S* − *λI* | = 0. That is, *(*4−*λ)*[*(*4−*λ)(*6−*λ)*−4] = <sup>0</sup> <sup>⇒</sup> *(*4−*λ)*[*λ*<sup>2</sup> <sup>−</sup>10*λ*+20] = <sup>0</sup> <sup>⇒</sup> *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>5</sup>+√5*, λ*<sup>2</sup> <sup>=</sup> <sup>4</sup>*, λ*<sup>3</sup> <sup>=</sup> <sup>5</sup> <sup>−</sup> <sup>√</sup>5. An eigenvector corresponding to *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>5</sup> <sup>+</sup> <sup>√</sup>5 can be determined from the system *(S* − *λ*1*I )X* = *O* wherein first equation gives *x*<sup>1</sup> = 0 and the third one yields <sup>2</sup>*x*<sup>2</sup> <sup>+</sup> *(*<sup>1</sup> <sup>−</sup> <sup>√</sup>5*)x*<sup>3</sup> <sup>=</sup> 0. Taking *<sup>x</sup>*<sup>3</sup> <sup>=</sup> 2, *<sup>x</sup>*<sup>2</sup> = −*(*<sup>1</sup> <sup>−</sup> <sup>√</sup>5*)*, and it is easily verified that these values also satisfy the second equation. Thus, an eigenvector, denoted by *Y*1, is the following:

$$Y\_1 = \begin{bmatrix} 0 \\ \sqrt{5} - 1 \\ 2 \end{bmatrix} \text{ with its normalized form being } A\_1 = \frac{1}{\sqrt{10 - 2\sqrt{5}}} \begin{bmatrix} 0 \\ \sqrt{5} - 1 \\ 2 \end{bmatrix}.$$

so that the first principal component is *u*<sup>1</sup> = √ 1 10−2 √5 [*(* <sup>√</sup><sup>5</sup> <sup>−</sup> <sup>1</sup>*)x*<sup>2</sup> <sup>+</sup> <sup>2</sup>*x*3]. Let us verify that the variance of *u*<sup>1</sup> equals *λ*1:

$$\begin{split} \text{Var}(\mu\_1) &= \frac{1}{10 - 2\sqrt{5}} [ (\sqrt{5} - 1)^2 \text{Var}(\mathbf{x}\_2) + 4 \text{Var}(\mathbf{x}\_3) + 2(\sqrt{5} - 1) \text{Cov}(\mathbf{x}\_2, \mathbf{x}\_3)] \\ &= \frac{1}{10 - 2\sqrt{5}} [ (6 - 2\sqrt{5})(4) + 4(6) + 2(\sqrt{5} - 1)(4) ] = \frac{20(5 + \sqrt{5})}{20} \\ &= 5 + \sqrt{5} = \lambda\_1. \end{split}$$

An eigenvector corresponding to *λ*<sup>2</sup> = 4 is available from *(S* − *λ*2*I )X* = *O*. The first equation shows that all variables are arbitrary. The second and third equations yield *x*<sup>3</sup> = 0 and *x*<sup>2</sup> = 0. Taking *x*<sup>1</sup> = 1, an eigenvector corresponding to *λ*<sup>2</sup> = 4, denoted by *Y*2, is

$$Y\_2 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \text{ which is already normalized so that } A\_2 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}.$$

Thus, the second principal component is *u*<sup>2</sup> = *x*<sup>1</sup> and its variance is Var*(x*1*)* = 4. An eigenvector corresponding to *<sup>λ</sup>*<sup>3</sup> <sup>=</sup> <sup>5</sup> <sup>−</sup> <sup>√</sup>5 can be determined from the linear system *(S* <sup>√</sup> <sup>−</sup> *<sup>λ</sup>*3*I )X* <sup>=</sup> *<sup>O</sup>* whose first equation yields *<sup>x</sup>*<sup>1</sup> <sup>=</sup> 0, the third one giving 2*x*<sup>2</sup> <sup>+</sup> *(*<sup>1</sup> <sup>+</sup> <sup>5</sup>*)x*<sup>3</sup> <sup>=</sup> 0. Taking *<sup>x</sup>*<sup>3</sup> <sup>=</sup> 2, *<sup>x</sup>*<sup>2</sup> = −*(*<sup>1</sup> <sup>+</sup> <sup>√</sup>5*)*, and it is readily verified that these values satisfy the second equation as well. Then, an eigenvector, denoted by *Y*3, is

$$Y\_3 = \begin{bmatrix} 0 \\ -(\sqrt{5} + 1) \\ 2 \end{bmatrix}, \text{its normalized form being } A\_3 = \frac{1}{\sqrt{10 + 2\sqrt{5}}} \begin{bmatrix} 0 \\ -(\sqrt{5} + 1) \\ 2 \end{bmatrix}.$$

In order to select the matrix [*A*1*, A*2*, A*3] uniquely, we may multiply *A*<sup>3</sup> by −1 so that the first nonzero element is positive. Thus, the third principal component is *u*<sup>3</sup> = √ 1 10+2 √5 [*(*<sup>1</sup> <sup>+</sup> <sup>√</sup>5*)x*<sup>2</sup> <sup>−</sup> <sup>2</sup>*x*3] and, as the following calculations corroborate, its variance is indeed equal to *λ*3:

$$\begin{split} \text{Var}(u\_3) &= \frac{1}{\sqrt{10+2\sqrt{5}}} [(1+\sqrt{5})^2 \text{Var}(\mathbf{x}\_2) + 4\text{Var}(\mathbf{x}\_3) - 4(1+\sqrt{5})\text{Cov}(\mathbf{x}\_2, \mathbf{x}\_3)] \\ &= \frac{1}{2(5+\sqrt{5})} [4(6+2\sqrt{5}) + 4(6) - 4(1+\sqrt{5})(2)] = \frac{20}{5+\sqrt{5}} = 5 - \sqrt{5} = \lambda\_3 ... \end{split}$$

Additionally,

$$\frac{\lambda\_1}{\lambda\_1 + \lambda\_2 + \lambda\_3} = \frac{5 + \sqrt{5}}{14} \approx 0.52,$$

that is, approximately 52% of the total variance is due to *u*<sup>1</sup> or nearly 52% of the total variation in the data is accounted for by the first principal component. Also observe that

$$S[A\_1, A\_2, A\_3] = [A\_1, A\_2, A\_3]D, \; D = \text{diag}(\mathbf{S} + \sqrt{\mathbf{S}}, \mathbf{4}, \mathbf{5} - \sqrt{\mathbf{S}}) \text{ or } \mathbf{S},$$

$$S = [A\_1, A\_2, A\_3]D \begin{bmatrix} A\_1'\\ A\_2'\\ A\_3' \end{bmatrix}.$$

That is,

$$\begin{split} \mathcal{S} &= \begin{bmatrix} 4 & 0 & 0 \\ 0 & 4 & 2 \\ 0 & 2 & 6 \end{bmatrix} \\ &= \begin{bmatrix} 0 & 1 & 0 \\ \frac{\sqrt{5}-1}{\sqrt{10-2\sqrt{5}}} & 0 & -\frac{(1+\sqrt{5})}{\sqrt{10+2\sqrt{5}}} \\ \frac{2}{\sqrt{10-2\sqrt{5}}} & 0 & \frac{2}{\sqrt{10+2\sqrt{5}}} \end{bmatrix} \begin{bmatrix} 5+\sqrt{5} & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & 5-\sqrt{5} \end{bmatrix} \begin{bmatrix} 0 & \frac{\sqrt{5}-1}{\sqrt{10-2\sqrt{5}}} & \frac{2}{\sqrt{10-2\sqrt{5}}} \\ 1 & 0 & 0 \\ 0 & -\frac{(1+\sqrt{5})}{\sqrt{10+2\sqrt{5}}} & \frac{2}{\sqrt{10+2\sqrt{5}}} \end{bmatrix}, \end{split}$$

which completes the computations.

#### **9.5.1. Estimation and evaluation of the principal components**

If *X* ∼ *Np(O, Σ), Σ > O,* then the maximum likelihood estimator of *Σ*, denoted by *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S* where *n* is the sample size and *S* is the sample sum of products matrix. How can one evaluate the eigenvalues and eigenvectors in order to construct the principal components? One method consists of solving the polynomial equation |*Σ*−*λI* | = 0 for the population eigenvalues *λ*1*,...,λp*, or |*Σ*ˆ − *kI* | = 0 for obtaining the sample eigenvalues *k*<sup>1</sup> ≥ *k*<sup>2</sup> ≥ ··· ≥ *kp*. Direct evaluation of *k* by solving the determinantal equation is not difficult when *p* is small. However, for large values of *p*, one has to resort to mathematical software or some iterative process. We will illustrate such an iterative process for the population values *λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp* and the corresponding normalized eigenvectors *A*1*,...,Ap* (using our notations). Let us assume that the eigenvalues are distinct. Consider the following equation for determining the eigenvalues and eigenvectors:

$$
\Sigma A\_j = \lambda\_j A\_j, \ j = 1, \ldots, k. \tag{9.5.3}
$$

Take any initial *p*-vector *Wo* that is not orthogonal to *A*1, the eigenvector corresponding to the largest eigenvalue *λ*1. If *Wo* is orthogonal to *A*1, then, in the iterative process, *Wj* will not reach *A*1. Let *Yo* = √ 1 *W*- *oWo Wo* be the normalized *W*0. Consider the equation

$$\Sigma Y\_{j-1} = W\_j, \text{ that is, } \Sigma Y\_0 = W\_1, \ \Sigma Y\_1 = W\_2, \dots, \ Y\_j = \frac{1}{\sqrt{(W\_j' W\_j)}} W\_j, \ j = 1, \dots, \ell. \tag{9.5.4}$$

Halt the iteration when *Wj* approximately agrees with *Wj*−1, that is, *Wj* converges to some *W* which will then be *λ*1*A*<sup>1</sup> or *Yj* converges to *Y*1. At each stage, compute the quadratic form *δj* = *Y* - *<sup>j</sup>ΣYj* and make sure that *δj* is increasing. Suppose that *Wj* converges to some *W* for certain values of *j* or as *j* → ∞. At that stage, the equation is *ΣY* = *W* where *Y* = <sup>√</sup> 1 *W*- *W <sup>W</sup>*, the normalized *<sup>W</sup>*. Then, the equation is *ΣY* <sup>=</sup> <sup>√</sup>*W*- *W Y* . In other words, <sup>√</sup>*W*- *W* = *λ*<sup>1</sup> and *Y* = *A*1, which are the largest eigenvalue of *Σ* and the corresponding eigenvector *A*1. That is,

$$\lim\_{j \to \infty} \sqrt{W\_j' W\_j} = \lambda\_1$$

$$\lim\_{j \to \infty} \left[ \frac{1}{\sqrt{(W\_j' W\_j)}} W\_j \right] = \lim\_{j \to \infty} Y\_j = A\_1.$$

The rate of convergence in (9.5.4) depends upon the ratio *<sup>λ</sup>*<sup>2</sup> *λ*1 . If *λ*<sup>2</sup> is close to *λ*1, then the convergence will be very slow. Hence, the larger the difference between *λ*<sup>1</sup> and *λ*2, the faster the convergence. It is thus indicated to raise *Σ* to a suitable power *m* and initiate the iteration on this *Σ<sup>m</sup>* so that the difference between the resulting *λ*<sup>1</sup> and *λ*<sup>2</sup> be magnified accordingly, *λ<sup>m</sup> <sup>j</sup> , j* <sup>=</sup> <sup>1</sup>*,... , p,* being the eigenvalues of *<sup>Σ</sup>m*. If an eigenvalue is equal to one, then *Σ* must first be multiplied by a constant so that all the resulting eigenvalues of *Σ* are well separated, which will not affect the eigenvectors. As well, observe that *Σ* and *Σm*, *<sup>m</sup>* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,...,* share the same eigenvectors even though the eigenvalues are *λj* and *<sup>λ</sup><sup>m</sup> j* , *j* = 1*,... , p,* respectively; in other words, the normalized eigenvectors remain the same. After obtaining the largest eigenvalue *λ*<sup>1</sup> and the corresponding normalized eigenvector *A*1, consider

$$
\Sigma\_2 = \Sigma - \lambda\_1 A\_1 A\_1', \ \Sigma = \lambda\_1 A\_1 A\_1' + \lambda\_2 A\_2 A\_2' + \dots + \lambda\_p A\_p A\_p',
$$

where *Aj* is the column eigenvector corresponding to *λj* so that *AjA*- *<sup>j</sup>* is a *p* × *p* matrix for *j* = 1*,* 2*,...,p*. Now, carry out the iteration on *Σ*2, as was previously done on *Σ*. This will produce *λ*<sup>2</sup> and the corresponding normalized eigenvector *A*2. Note that *λ*<sup>2</sup> is the largest eigenvalue of *Σ*2. Next, consider

$$
\Sigma\_3 = \Sigma\_2 - \lambda\_2 A\_2 A\_2', 
$$

and continue this process until all the required eigenvectors are obtained. Similarly, for small *p*, the sample eigenvalues *k*<sup>1</sup> ≥ *k*<sup>2</sup> ≥···≥ *kp* are available by solving the equation |*Σ*ˆ − *kI* | = 0; otherwise, the sample eigenvalues and eigenvectors can be computed by applying the iterative process described above with *Σ* replaced by *Σ*ˆ , *λj* interchanged with *kj* and *Bj* substituted to the eigenvector *Aj* , *j* = 1*,...,p*.

**Example 9.5.2.** Principal component analysis is especially called for when the number of variables involved is large. For the sake of illustration, we will take *p* = 2. Consider the following 2 × 2 covariance matrix

$$
\Sigma = \begin{bmatrix} 2 & -1 \\ -1 & 1 \end{bmatrix} = \text{Cov}(X), \ X = \begin{bmatrix} x\_1 \\ x\_2 \end{bmatrix}.
$$

Construct the principal components associated with this matrix while working with the symbolic representations of the eigenvalues *λ*<sup>1</sup> and *λ*<sup>2</sup> rather than their actual values.

**Solution 9.5.2.** Let us first evaluate the eigenvalues of *Σ* in the customary fashion. Consider

$$|\Sigma - \lambda I| = 0 \Rightarrow (2 - \lambda)(1 - \lambda) - 1 = 0 \Rightarrow \lambda^2 - 3\lambda + 1 = 0 \tag{i}$$

$$\Rightarrow \lambda\_1 = \frac{1}{2}[3 + \sqrt{5}], \ \lambda\_2 = \frac{1}{2}[3 - \sqrt{5}].$$

Let us compute an eigenvector corresponding to the largest eigenvalue *λ*1. Then,

$$
\begin{bmatrix} 2 & -1 \\ -1 & 1 \end{bmatrix} \begin{bmatrix} \chi\_1 \\ \chi\_2 \end{bmatrix} = \lambda\_1 \begin{bmatrix} \chi\_1 \\ \chi\_2 \end{bmatrix} \Rightarrow
$$
 
$$(2 - \lambda\_1)\chi\_1 - \chi\_2 = 0$$
 
$$-\chi\_1 + (1 - \lambda\_1)\chi\_2 = 0. \tag{ii}$$

Since the system is singular, we need only solve one of these equations. Letting *x*<sup>2</sup> = 1 in *(ii)*, *x*<sup>1</sup> = *(*1−*λ*1*)*. For illustrative purposes, we will complete the remaining steps with the general parameters *λ*<sup>1</sup> and *λ*<sup>2</sup> rather than their numerical values. Hence, one eigenvector is

$$C = \begin{bmatrix} 1 - \lambda\_1 \\ 1 \end{bmatrix}, \ \|C\| = \sqrt{1 + (1 - \lambda\_1)^2} \Rightarrow A\_1 = \frac{1}{\|C\|}\\ C = \frac{1}{\sqrt{1 + (1 - \lambda\_1)^2}} \begin{bmatrix} 1 - \lambda\_1 \\ 1 \end{bmatrix}.$$

Thus, the principal components are the following:

$$\mu\_1 = \frac{1}{\|C\|} C'X = \frac{1}{\sqrt{1 + (1 - \lambda\_1)^2}} \{ (1 - \lambda\_1)x\_1 + x\_2 \},$$

$$\mu\_2 = \frac{1}{\sqrt{1 + (1 - \lambda\_2)^2}} \{ (1 - \lambda\_2)x\_1 + x\_2 \}.$$

Let us verify that Var*(u*1*)* = *λ*<sup>1</sup> and Var*(u*2*)* = *λ*2. Note that

$$\|C\|^2 = 1 + (1 - \lambda\_1)^2 = \lambda\_1^2 - 2\lambda\_1 + 2 = (\lambda\_1^2 - 3\lambda\_1 + 1) + \lambda\_1 + 1 = \lambda\_1 + 1,\quad(\text{iii})$$

given the characteristic equation *(i)*. However,

$$\begin{split} \text{Var}(C'X) &= \text{Var}((1-\lambda\_1)\mathbf{x}\_1 + \mathbf{x}\_2) \\ &= (1-\lambda\_1)^2 \text{Var}(\mathbf{x}\_1) + \text{Var}(\mathbf{x}\_2) + 2(1-\lambda\_1)\text{Cov}(\mathbf{x}\_1, \mathbf{x}\_2) \\ &= 2(1-\lambda\_1)^2 + 1 - 2(1-\lambda\_1) \\ &= 2\lambda\_1^2 - 2\lambda\_1 + 1 = 2(\lambda\_1^2 - 3\lambda\_1 + 1) + 4\lambda\_1 - 1 = 4\lambda\_1 - 1, \qquad \text{(iv)} \end{split}$$

given *(i)*, the characteristic (or starting) equation. We now have Var*(C*- *X)* = Var*((*1 − *λ*1*)x*<sup>1</sup> + *x*2*)* = 4*λ*<sup>1</sup> − 1 from *(iv)* and *C* = *λ*<sup>1</sup> + 1 from *(iii)*. Hence, we must have Var*(C*- *X)* = *C*2*λ*<sup>1</sup> = [<sup>1</sup> <sup>+</sup> *(*<sup>1</sup> <sup>−</sup> *<sup>λ</sup>*1*)*2]*λ*<sup>1</sup> <sup>=</sup> *(λ*<sup>1</sup> <sup>+</sup> <sup>1</sup>*)λ*<sup>1</sup> from *(iii)*. Since

$$
\lambda\_1(\lambda\_1 + 1) = \lambda\_1^2 + \lambda\_1 = (\lambda\_1^2 - 3\lambda\_1 + 1) + 4\lambda\_1 - 1 = 4\lambda\_1 - 1,
$$

agrees with *(iv)*, the result is verified for *u*1. Moreover, on replacing *λ*<sup>1</sup> by *λ*2*,* we have Var*(u*2*)* = *λ*2.

#### **9.5.2.** *L*1 **and** *L*2 **norm principal components**

For any *p* × 1 vector *Y* , *Y* - = *(y*1*,...,yp)*, where *y*1*,...,yp* are real quantities, the *L*2 and *L*1 norms are respectively defined as follows:

$$\|Y\|\_{2} = (\mathbf{y}\_1^2 + \dots + \mathbf{y}\_p^2)^{\frac{1}{2}} = \|Y'Y\|^{\frac{1}{2}} \Rightarrow \|Y\|\_{2}^2 = Y'Y \tag{9.5.5}$$

$$\|Y\|\_1 = |\mathbf{y}\_1| + |\mathbf{y}\_2| + \dots + |\mathbf{y}\_p| = \sum\_{j=1}^{r} |\mathbf{y}\_j|.\tag{9.5.6}$$

In Sect. 9.5, we set up the principal component analysis on the basis of the sample sum of products matrix, assuming a *p*-variate real population, not necessarily Gaussian, having *μ* as its mean value vector and *Σ>O* as its covariance matrix. Then, we considered *Xj , j* = 1*, . . . , n,* iid vector random variables, that is, a simple random sample of size *n* from this population such that *E*[*Xj* ] = *μ* and Cov*(Xj )* = *Σ > O, j* = 1*,...,n*. We denoted the sample matrix by **<sup>X</sup>** = [*X*1*,...,Xn*], the sample average by *<sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>*[*X*<sup>1</sup> + ···+*Xn*], the matrix of sample means by **X**¯ = [*X,... ,* ¯ *X*¯ ] and the sample sum of products matrix by *S* = [**X** − **X**¯ ][**X** − **X**¯ ] - . If the population mean value vector is the null vector, that is, *μ* = *O,* then we can take *S* = **XX**- . For convenience, we will consider this case. For determining the sample principal components, we then maximized

$$A' \mathbf{X} \mathbf{X}' A \text{ subject to } A'A = 1$$

where *A* is an arbitrary constant vector that will result in the coefficient vector of the principal components. This can also be stated in terms of maximizing the square of an *L*2 norm subject to *A*- *A* = 1, that is,

$$\max\_{A'A=1} \|A'X\|\_2^2. \tag{9.5.7}$$

When carrying out statistical analyses, it turns out that the *L*2 norm is more sensitive to outliers than the *L*1 norm. If we utilize the *L*1 norm, the problem corresponding to (9.5.7) is

$$\max\_{A'A=1} \|A'X\|\_1. \tag{9.5.8}$$

Observe that when *μ* = *O*, the sum of products matrix can be expressed as

$$\mathbf{S} = \mathbf{X}\mathbf{X}' = \sum\_{j=1}^{p} X\_j X\_j', \ \mathbf{X} = [X\_1, \dots, X\_n],$$

where *Xj* is the *j* -th column of **X** or *j* -th sample vector. Then, the initial optimization problem can be restated as follows:

$$\max\_{A'A=1} \|A'X\|\_2^2 = \max\_{A'A=1} \sum\_{j=1}^n A'X\_jX\_j'A = \max\_{A'A=1} \sum\_{j=1}^n \|A'X\_j\|\_2^2,\tag{9.5.9}$$

and the corresponding *L*1 norm optimization problem can be formulated as follows:

$$\max\_{A'A=1} \|A'X\|\_1 = \max\_{A'A=1} \sum\_{j=1}^n \|A'X\_j\|\_1. \tag{9.5.10}$$

We have obtained exact analytical solutions for the coefficient vector *A* in (9.5.9); however, this is not possible when attempting to optimize (9.5.10) subject to *A*- *A* = 1. Thankfully, iterative procedures are available in this case.

Let us consider a generalization of the basic principal component analysis. Let *W* be a *p* ×*m, m < p,* matrix of full rank *m*. Then, the general problem in *L*2 norm optimization is the following:

$$\max\_{W'W=I} \text{tr}(W'SW) = \max\_{W'W=I} \sum\_{j=1}^{n} \|W'X\_j\|\_2^2 \tag{9.5.11}$$

where *I* is the *m* × *m* identity matrix, the corresponding *L*1 norm optimization problem being

$$\max\_{W'W=I} \sum\_{j=1}^{n} \|WX\_j\|\_1. \tag{9.5.12}$$

In Sect. 9.4.1, we have considered a dual problem of minimization for constructing principal components. The dual problems of minimization corresponding to (9.5.11) and (9.5.12) can be stated as follows:

$$\min\_{W'W=I} \sum\_{j=1}^{n} \|X\_j - WW'X\_j\|\_2^2 \tag{9.5.13}$$

with respect to the *L*2 norm, and

$$\min\_{W'W=I} \sum\_{j=1}^{n} \|X\_j - WW'X\_j\|\_1 \tag{9.5.14}$$

with respect to the *L*1 norm. The form appearing in (9.5.13) suggests that the construction of principal components by making use of the *L*2 norm is also connected to general model building, Factor Analysis and related topics. The general mathematical problem pertaining to the basic structure in (9.5.13) is referred to as low-rank matrix factorization. Readers interested in such statistical or mathematical problems may refer to the survey article, Shi et al. (2017). *L*1 norm optimization problems are for instance discussed in Kwak (2008) and Nie and Huang (2016).

#### **9.6. Distributional Aspects of Eigenvalues and Eigenvectors**

Let us examine the distributions of the variances of the sample principal components and the coefficient vectors in the sample principal components. Let *A* be a *p* × *p* orthonormal matrix whose columns are denoted by *A*1*,...,Ap*, so that *A*- *<sup>j</sup>Aj* = 1*, A*- *i Aj* = 0*, i* = *j,* or *AA*- = *I, A*- *A* = *I* . Let *uj* = *A*- *<sup>j</sup>X* be the sample principal components for *j* = 1*,...,p*. Let the *p* × 1 vector *X* have a nonsingular normal density with the null vector as its mean value vector, that is, *X* ∼ *Np(O, Σ), Σ > O*. We can assume that the Gaussian distribution has a null mean value vector without any loss of generality since we are dealing with variances and covariances, which are free of any location parameter. Consider a simple random sample of size *n* from this normal population. Letting *S* denote the sample sum of products matrix, the maximum likelihood estimate of *<sup>Σ</sup>* is *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S*. Given that *S* has a Wishart distribution having *n*−1 degrees of freedom, the density of *Σ*ˆ , denoted by *f (Σ)*ˆ , is given by

$$f(\hat{\Sigma}) = \frac{n^{\frac{mp}{2}}}{|2\Sigma|^{\frac{m}{2}}\Gamma\_p(\frac{m}{2})} |\hat{\Sigma}|^{\frac{m}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{n}{2} \text{tr}(\Sigma^{-1}\hat{\Sigma})},$$

where *Σ* is the population covariance matrix and *m* = *n* − 1, *n* being the sample size. Let *k*1*,...,kp* be the eigenvalues of *Σ*ˆ ; it was shown that *k*<sup>1</sup> ≥ *k*<sup>2</sup> ≥ ··· ≥ *kp >* 0 are actually the variances of the sample principal components. Letting *Bj , j* = 1*,... , p,* denote the coefficient vectors of the sample principal components and *B* = *(B*1*,...,Bp)*, it was established that *B*- *ΣB*ˆ = diag*(k*1*,...,kp)* ≡ *D*. The first nonzero component of *Bj* is required to be positive for *j* = 1*,...,p*, so that *B* be uniquely determined. Then, the joint density of *B* and *D* is available by transforming *Σ*ˆ in terms of *B* and *D*. Let the joint density of *B* and *D* and the marginal densities of *D* and *B* be respectively denoted by *g(D, B)*, *g*1*(D)* and *g*2*(B)*. In light of the procedures and results presented in Chap. 8 or in Mathai (1997), we have the following joint density:

$$\begin{split} \log(D,B)\mathrm{d}D \wedge \mathrm{d}B &= \frac{n^{\frac{mp}{2}}}{|2\boldsymbol{\Sigma}|^{\frac{m}{2}}\Gamma\_{p}(\frac{m}{2})} \Big\{ \prod\_{j=1}^{p} k\_{j} \Big\}^{\frac{m}{2} - \frac{p+1}{2}} \Big\{ \prod\_{i$$

where *h(B)* is the differential element corresponding to *B*, which is given in Chap. 8 and in Mathai (1997). The marginal densities of *D* and *B* are not explicitly available in a convenient form for a general *Σ*. In that case, they can only be expressed in terms of hypergeometric functions of matrix argument and zonal polynomials. See, for example, Mathai et al. (1995) for a discussion of zonal polynomials. However, when *Σ* = *I* , the joint and marginal densities are available in closed forms. They are

$$\log(D,B)\mathrm{d}D \wedge \mathrm{d}B = \frac{n^{\frac{mp}{2}}}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})} \left\{ \prod\_{j=1}^p k\_j \right\}^{\frac{m}{2} - \frac{p+1}{2}} \left\{ \prod\_{i$$

$$g\_1(D) = \frac{n^{\frac{mp}{2}}}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})} \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \left\{ \prod\_{j=1}^p k\_j \right\}^{\frac{m}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{n}{2}(k\_1 + \dots + k\_p)} \left\{ \prod\_{i$$

$$\log\_2(B)\mathrm{dB} = \frac{\Gamma\_p(\frac{p}{2})}{\pi^{\frac{p^2}{2}}}h(B),\ \mathrm{BB}' = I = B'B,\ k\_1 > k\_2 > \cdots > k\_p > 0. \tag{9.6.4}$$

Given the representation of the joint density *g(D, B)* appearing in (9.6.2), it is readily seen that *<sup>D</sup>* and *<sup>B</sup>* are independently distributed. Similar results are obtainable for *<sup>Σ</sup>* <sup>=</sup> *<sup>σ</sup>*2*<sup>I</sup>* where *σ*<sup>2</sup> *>* 0 is a real scalar quantity and *I* is the identity matrix.

**Example 9.6.1.** Show that the function *g*1*(D)* given in (9.6.3) is a density for *p* = 2*, n* = 6*, m* = 5*,* and derive the density of *k*<sup>1</sup> for this special case.

**Solution 9.6.1.** We have *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> <sup>=</sup> <sup>3</sup> <sup>2</sup> *, <sup>m</sup>* <sup>2</sup> <sup>−</sup> <sup>3</sup> <sup>2</sup> = 1. As *g*1*(D)* is manifestly nonnegative, it suffices to show that the total integral equals 1. When *p* = 2 and *n* = 6, the constant part of (9.6.3) is the following:

$$\begin{split} \frac{n^{\frac{mp}{2}}}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})} \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} &= \frac{6^5}{2^5 \Gamma\_p(\frac{5}{2})} \frac{\pi^2}{\Gamma\_2(1)} \\ &= \frac{3^5}{\sqrt{\pi}\,\Gamma(\frac{5}{2})\Gamma(\frac{2}{2})} \frac{\pi^2}{\sqrt{\pi}\,\Gamma(1)\sqrt{\pi}} \\ &= \frac{3^5}{\sqrt{\pi}(\frac{3}{2})(\frac{1}{2})\sqrt{\pi}} \frac{\pi^2}{\sqrt{\pi}\,\sqrt{\pi}} = 4(3^4). \end{split} \tag{6}$$

As for the integral part, we have

\$ <sup>∞</sup> *k*1=0 \$ *<sup>k</sup>*<sup>1</sup> *k*2=0 *<sup>k</sup>*1*k*2*(k*<sup>1</sup> <sup>−</sup> *<sup>k</sup>*2*)*e−3*(k*1+*k*2*)* d*k*<sup>1</sup> ∧ d*k*<sup>2</sup> = \$ <sup>∞</sup> *k*1=0 *k*2 1e<sup>−</sup>3*k*<sup>1</sup> \$ *<sup>k</sup>*<sup>1</sup> *k*2=0 *k*2e<sup>−</sup>3*k*<sup>2</sup> d*k*<sup>2</sup> d*k*<sup>1</sup> − \$ <sup>∞</sup> *k*1=0 *k*1e<sup>−</sup>3*k*<sup>1</sup> \$ *<sup>k</sup>*<sup>1</sup> *k*2=0 *k*2 2e<sup>−</sup>3*k*<sup>2</sup> <sup>d</sup>*k*<sup>2</sup> d*k*<sup>1</sup> = \$ <sup>∞</sup> *k*1=0 *k*2 1e<sup>−</sup>3*k*<sup>1</sup> − *k*1 3 <sup>e</sup>−3*k*<sup>1</sup> <sup>+</sup> 1 <sup>32</sup> *(*<sup>1</sup> <sup>−</sup> <sup>e</sup>−3*k*<sup>1</sup> *)* <sup>−</sup> *<sup>k</sup>*1e<sup>−</sup>3*k*<sup>1</sup> − *k*2 1 3 <sup>e</sup>−3*k*<sup>1</sup> <sup>−</sup> <sup>2</sup>*k*<sup>1</sup> <sup>32</sup> <sup>e</sup>−3*k*<sup>1</sup> <sup>+</sup> 2 <sup>33</sup> *(*<sup>1</sup> <sup>−</sup> <sup>e</sup>−3*k*<sup>1</sup> *)* d*k*<sup>1</sup> (*ii*) = −<sup>1</sup> 3 [*Γ (*4*)*6−4] + 1 9 [*Γ (*3*)*3−3] − 1 <sup>32</sup> [*Γ (*3*)*6−3] + 1 3 [*Γ (*4*)*6−4] + 2 <sup>32</sup> [*Γ (*3*)*6−3] − 2 <sup>33</sup> [*Γ (*2*)*3−2] + 2 <sup>32</sup> [*Γ (*2*)*6−2] = 1 <sup>32</sup> [*Γ (*3*)*6−3] + 2 <sup>33</sup> [*Γ (*2*)*6−2] + 1 <sup>32</sup> [*Γ (*3*)*3−3] − 2 <sup>33</sup> [*Γ (*2*)*3−2] = 1 <sup>4</sup> <sup>+</sup> 2 <sup>4</sup> <sup>+</sup> <sup>2</sup> <sup>−</sup> <sup>2</sup> 1 <sup>35</sup> <sup>=</sup> <sup>1</sup> 4*(*34*) .* (*iii*)

The product of *(i)* and *(iii)* being equal to 1*,* this establishes that *g*1*(D)* is indeed a density function for *p* = 2 and *n* = 6. On integrating out *k*2, the resulting integrand appearing in *(ii)* yields the marginal density of *k*1. Denoting the marginal density of *k*<sup>1</sup> by *g*11*(k*1*)*, after some simplifications, we have

$$g\_{11}(k\_1) = 4(3^4) \left| \left(\frac{k\_1^2}{3^2} + \frac{2k\_1}{3^3}\right) \mathbf{e}^{-6k\_1} + \left(\frac{k\_1^2}{3^2} - \frac{2k\_1}{3^3}\right) \mathbf{e}^{-3k\_1} \right|, \ 0 < k\_1 < \infty, \ \mathbf{e}$$

and zero elsewhere. Similarly, given *g*1*(D)*, the marginal density of *k*<sup>2</sup> is obtained by integrating out *k*<sup>1</sup> from *k*<sup>2</sup> to ∞. This completes the computations.

#### **9.6.1. The distributions of the largest and smallest eigenvalues**

One can determine the distribution of the *j* -th largest eigenvalue *kj* and thereby, the distribution of the largest one, *k*1, as well as that of the smallest one, *kp*. Actually, many authors have been investigating the problem of deriving the distributions of the eigenvalues of a central Wishart matrix and matrix-variate type-1 beta and type-2 beta distributions. Some of the test statistics discussed in Chap. 6 can also be treated as eigenvalue problems involving real and complex type-1 beta matrices. Let *k*<sup>1</sup> *> k*<sup>2</sup> *>* ··· *> kp >* 0 be the eigenvalues of the real sampling distribution of a Wishart matrix generated from a *p*variate real Gaussian sample. Then, as given in (9.6.3), the joint density of *k*1*,...,kp*, denoted by *g*1*(D)*, is the following:

$$\log\_1(D)\mathrm{d}D = \frac{n^{\frac{mp}{2}}}{2^{\frac{mp}{2}}\Gamma\_p(\frac{m}{2})} \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \left\{ \prod\_{j=1}^p k\_j^{\frac{m}{2} - \frac{p+1}{2}} \right\} \mathrm{e}^{-\frac{n}{2}(k\_1 + \cdots + k\_p)} \left\{ \prod\_{i$$

where *m* = *n* − 1 is the number of degrees of freedom, *n* being the sample size. More generally, we may consider a *p* × *p* real matrix-variate gamma distribution having *α* as its shape parameter and *aI, a >* 0*,* as its scale parameter matrix, *I* denoting the identity matrix. Noting that the eigenvalues of *aS* are equal to the eigenvalues of *S* multiplied by the scalar quantity *a*, we may take *a* = 1 without any loss of generality. Let *g(S)*d*S* denote the density of the resulting gamma matrix and *g(D)* be *g(S)* expressed in terms of *λ*<sup>1</sup> *> λ*<sup>2</sup> *>* ··· *> λp >* 0, the eigenvalues of *S*. Then, given the Jacobian of the transformation *S* → *D* specified in (9.6.7), the joint density of *λ*1*,...,λp*, denoted by *g*1*(D)* is obtained as

$$\log(D)\mathrm{d}S = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_P(a)\Gamma\_P(\frac{p}{2})} \left\{ \prod\_{j=1}^p \lambda\_j^{a - \frac{p+1}{2}} \right\} \mathrm{e}^{-(\lambda\_1 + \cdots + \lambda\_p)} \left\{ \prod\_{i$$

where d*D* = d*λ*<sup>1</sup> ∧ *...* ∧ d*λp*. The corresponding *p* × *p* complex matrix-variate gamma distribution will have real eigenvalues, also denoted by *λ*<sup>1</sup> *>* ··· *> λp >* 0, the matrix *S*˜ being Hermitian positive definite in this case. Then, in light of the relationship (9.6a.2) between the differential elements of *S*˜ and *D*, the joint density of *λ*1*,...,λp,* denoted by *g*˜1*(D)*, is given by

$$\tilde{\mathbf{g}}(D)\mathbf{d}\tilde{\mathbf{S}} = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(\alpha)\tilde{\Gamma}\_p(p)} \left\{ \prod\_{j=1}^p \lambda\_j^{a-p} \right\} \mathbf{e}^{-(\lambda\_1 + \cdots + \lambda\_p)} \left\{ \prod\_{i$$

where *g(D)* ˜ is *g(*˜ *S)*˜ expressed in terms of *D*, *g(*˜ *S)*˜ denoting the density function of a matrix-variate gamma random variable whose shape parameter and scale parameter matrix are *α* and *I* , respectively.

As explained for instance in Mathai (1997), when *S* and *S*˜ are the *p* × *p* gamma matrices in the real and complex domains, the integration over the Stiefel manifold yields

$$\mathrm{d}S = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \Big| \prod\_{i$$

and

$$\mathrm{d}\tilde{S} = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)} \Big| \prod\_{i$$

respectively. When endeavoring to derive the marginal density of *λj* for any fixed *j* , the difficulty arises from the factor - *i<j (λi* − *λj )*. So, let us first attempt to simplify this factor.

#### **9.6.2. Simplification of the factor** - *i<j (λi* − *λj )*

It is well known that one can write - *i<j (λi* − *λj )* as a Vandermonde determinant which, incidentally, has been utilized in connection with nonlinear transformations in Mathai (1997). That is,

$$\prod\_{i$$

where the *(i, j )*-th element *aij* <sup>=</sup> *<sup>λ</sup>p*−*<sup>j</sup> <sup>i</sup>* for all *i* and *j* . We could consider a cofactor expansion of the determinant, |*A*|, consisting of expanding it in terms of the elements and their cofactors along any row. In this case, it would be advantageous to do so along the *i*-th row as the cofactors would then be free of *λi* and the coefficients of the cofactors would only be powers of *λi*. However, we would then lose the symmetry with respect to the elements of the cofactors in this instance. Since symmetry is required for the procedure to be discussed, we will consider the general expansion of a determinant, that is,

Principal Component Analysis

$$|A| = \sum\_{K} (-1)^{\rho\_K} a\_{1k\_1} a\_{2k\_2} \cdots a\_{pk\_p} = \sum\_{K} (-1)^{\rho\_K} \lambda\_1^{p-k\_1} \lambda\_2^{p-k\_2} \cdots \lambda\_p^{p-k\_p} \tag{9.6.9}$$

where *K* = *(k*1*,...,kp)* and *(k*1*,...,kp)* is a given permutation of the numbers *(*1*,* 2*,... , p)*. Defining *ρK* as the number of transpositions needed to bring *(k*1*,...,kp)* into the natural order *(*1*,* <sup>2</sup>*,... , p)*, *(*−1*)ρK* will correspond to a <sup>−</sup> sign for the corresponding term if *ρK* is odd, the sign being otherwise positive. For example, for *p* = 3, the possible permutations are *(*1*,* 2*,* 3*), (*1*,* 3*,* 2*), (*2*,* 3*,* 1*), (*2*,* 1*,* 3*), (*3*,* 1*,* 2*), (*3*,* 2*,* 1*)*, so that there are 3! = 6 terms. For the sequence *(*1*,* 2*,* 3*), k*<sup>1</sup> = 1*, k*<sup>2</sup> = 2 and *k*<sup>3</sup> = 3; for the sequence *(*1*,* 3*,* 2*), k*<sup>1</sup> = 1*, k*<sup>2</sup> = 3 and *k*<sup>3</sup> = 2, and so on, *<sup>K</sup>* representing the sum of all such terms multiplied by *(*−1*)ρK* . For *(*1*,* <sup>2</sup>*,* <sup>3</sup>*), ρK* <sup>=</sup> 0 corresponding to a plus sign; for *(*1*,* 3*,* 2*), ρK* = 1 corresponding to a minus sign, and so on. Other types of expansions of |*A*| could also be utilized. As it turns out, the general expansion given in (9.6.9) is the most convenient one for deriving the marginal densities.

In the complex case,

$$\prod\_{i$$

where

$$A' = \begin{vmatrix} \lambda\_1^{p-1} & \lambda\_2^{p-1} & \dots & \lambda\_p^{p-1} \\ \lambda\_1^{p-2} & \lambda\_2^{p-2} & \dots & \lambda\_p^{p-2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \dots & 1 \end{vmatrix} = [\alpha\_1, \alpha\_2, \dots, \alpha\_p], \ a\_j = \begin{bmatrix} \lambda\_j^{p-1} \\ \lambda\_j^{p-2} \\ \vdots \\ 1 \end{bmatrix}. \tag{i}$$

Observe that *αj* only contains *λj* and that *A*- *A* = *α*1*α*- <sup>1</sup> +···+ *αpα*- *<sup>p</sup>*, so that for instance the *(i, j )*-th element in *α*1*α*- <sup>1</sup> is *<sup>λ</sup>*2*p*−*(i*+*j )* <sup>1</sup> . Accordingly, the *(i, j )*-th element of *α*1*α*- 1 + ···+ *αpα*- *<sup>p</sup>* is *<sup>p</sup> <sup>r</sup>*=<sup>1</sup> *<sup>λ</sup>*2*p*−*(i*+*j ) <sup>r</sup>* . Thus, letting *<sup>B</sup>* <sup>=</sup> *<sup>A</sup>*- *<sup>A</sup>* <sup>=</sup> *(bij ), bij* <sup>=</sup> *<sup>p</sup> <sup>r</sup>*=<sup>1</sup> *<sup>λ</sup>*2*p*−*(i*+*j ) <sup>r</sup>* . Now, consider the expansion (9.6.9) of |*B*|, that is,

$$\prod\_{i$$

where *K* = *(k*1*,...,kp)* and *(k*1*,...,kp)* is a permutation of the sequence *(*1*,* 2*,... , p)*. Note that

$$\begin{aligned} b\_{1k\_1} &= \lambda\_1^{2p - (1 + k\_1)} + \lambda\_2^{2p - (1 + k\_1)} + \dots + \lambda\_p^{2p - (1 + k\_1)} \\ b\_{2k\_2} &= \lambda\_1^{2p - (2 + k\_2)} + \lambda\_2^{2p - (2 + k\_2)} + \dots + \lambda\_p^{2p - (2 + k\_2)} \\ &\vdots\\ b\_{pk\_p} &= \lambda\_1^{2p - (p + k\_p)} + \lambda\_2^{2p - (p + k\_p)} + \dots + \lambda\_p^{2p - (p + k\_p)}. \end{aligned}$$

Let us write

$$b\_{1k\_1} b\_{2k\_2} \cdots b\_{pk\_p} = \sum\_{r\_1, \dots, r\_p} \lambda\_1^{r\_1} \cdots \lambda\_p^{r\_p} \,. \tag{i\nu}$$

Then,

$$\prod\_{i$$

where the *rj* 's, *j* = 1*,... , p,* are nonnegative integers. We may now express the joint density of the eigenvalues in a systematic way.

#### **9.6.3. The distributions of the eigenvalues**

The joint density of *λ*1*,...,λp* as specified in (9.6.6) and (9.6a.1) can be expressed as follows. In the real case,

$$f(D)\mathrm{d}D = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})\Gamma\_p(\alpha)} \left(\prod\_{j=1}^p \lambda\_j^{\alpha - \frac{p+1}{2}}\right) \mathrm{e}^{-(\lambda\_1 + \cdots + \lambda\_p)} \left(\sum\_K (-1)^{\rho\_K} \lambda\_1^{p-k\_1} \cdots \lambda\_p^{p-k\_p}\right) \mathrm{d}D$$

$$= \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})\Gamma\_p(\alpha)} \sum\_K (-1)^{\rho\_K} (\lambda\_1^{m\_1} \cdots \lambda\_p^{m\_p}) \mathrm{e}^{-(\lambda\_1 + \cdots + \lambda\_p)} \mathrm{d}D \tag{9.6.10}$$

with

$$m\_j = \alpha - \frac{p+1}{2} + p - k\_j. \tag{i}$$

In the complex case, the joint density is

$$\tilde{f}(D)\mathrm{d}D = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)\tilde{\Gamma}\_p(\alpha)}\sum\_{K}(-1)^{\rho\_K}\sum\_{r\_1,\ldots,r\_p}(\lambda\_1^{\alpha-p+r\_1}\cdots\lambda\_p^{\alpha-p+r\_p})\,\mathrm{e}^{-(\lambda\_1+\cdots+\lambda\_p)}\mathrm{d}D$$

$$=\frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)\tilde{\Gamma}\_p(\alpha)}\sum\_{K}(-1)^{\rho\_K}\sum\_{r\_1,\ldots,r\_p}(\lambda\_1^{m\_1}\cdots\lambda\_p^{m\_p})\,\mathrm{e}^{-(\lambda\_1+\cdots+\lambda\_p)}\mathrm{d}D\qquad(9.6a.4)$$

with *mj* = *α* − *p* + *rj* and *rj* as defined in (9.6a.3). For convenience, we will use the same symbol *mj* for both the real and complex cases; however, in the real case, *mj* = *<sup>α</sup>* <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> + *p* − *kj* and, in the complex case, *mj* = *α* − *p* + *rj* .

The distributions of the largest, smallest and *j* -th largest eigenvalues were considered by many authors. Earlier works mostly dealt with eigenvalue problems associated with testing hypotheses on the parameters of one or more real Gaussian populations. In such situations, a one-to-one function of the likelihood ratio statistics could be explored in terms of the eigenvalues of a real type-1 beta distributed matrix-variate random variable. In a series of papers, Pillai constructed the distributions of the seven largest eigenvalues in the type-1 beta distribution and produced percentage points as well; the reader may refer for example to Pillai (1964) and the references therein. In a series of papers including Khatri (1964), Khatri addressed the distributions of eigenvalues in the real and complex domains. In a series of papers, Krishnaiah and his co-authors dealt with various distributional aspects of eigenvalues, see for instance Krishnaiah et al. (1973). Clemm et al. (1973) computed upper percentage points of the distribution of the eigenvalues of the Wishart matrix. James (1964) considered the eigenvalue problem of different types of matrix-variate random variables and determined their distributions in terms of functions of matrix arguments and zonal polynomials. In a series of papers Davis, dealt with the distributions of eigenvalues by creating and solving systems of differential equations, see for example Davis (1972). Edelman (1991) discussed the distributions and moments of the smallest eigenvalue of Wishart type matrices. Johnstone (2001) examined the distribution of the largest eigenvalue in Principal Component Analysis. Recently, Chiani (2014) and James and Lee (2021) discussed the distributions of the eigenvalues of Wishart matrices. The methods employed in these papers lead to representations of the distributions of eigenvalues in terms of Pfaffians of skew symmetric matrices, incomplete gamma functions, multiple integrals, functions of matrix argument and zonal polynomials, ratios of determinants, solutions of differential equations, and so on. None of those methods yield tractable forms for the distribution or density functions of eigenvalues.

In the next subsections, we provide, in explicit forms, the exact distributions of any of the *j* -th largest eigenvalue of a general real or complex matrix-variate gamma type matrix, either as finite sums when a certain quantity is a positive integer or as a product of infinite series in the general non-integer case. These include, for instance, the distributions of the largest and smallest eigenvalue as well as the joint distributions of several of the largest or smallest eigenvalues, and readily apply to the real and complex Wishart distributions.

### **9.6.4. Density of the smallest eigenvalue** *λp* **in the real matrix-variate gamma case**

We will initially examine the situation where *mj* <sup>=</sup> *<sup>α</sup>* <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> + *p* − *kj* is an integer, so that *mj* in the real matrix-variate gamma case is a positive integer. We will integrate out *λ*1*,...,λp*−<sup>1</sup> to obtain the marginal density of *λp*. Since *mj* is a positive integer, we can integrate by parts. For instance,

$$\int\_{\lambda\_1 = \lambda\_2}^{\infty} \lambda\_1^{m\_1} \mathbf{e}^{-\lambda\_1} d\lambda\_1 = [-\lambda\_1^{m\_1} \mathbf{e}^{-\lambda\_1}]\_{\lambda\_2}^{\infty} + [-m\_1 \lambda\_1^{m\_1 - 1} \mathbf{e}^{-\lambda\_1}]\_{\lambda\_2}^{\infty} + \dots + [-\mathbf{e}^{-\lambda\_1}]\_{\lambda\_2}^{\infty}$$

$$= \sum\_{\mu\_1 = 0}^{m\_1} \frac{m\_1!}{(m\_1 - \mu\_1)!} \lambda\_2^{m\_1 - \mu\_1} \mathbf{e}^{-\lambda\_2},\tag{i}$$

and integrating *λ*1*,...,λj*−<sup>1</sup> gives the following:

$$\int\_{\lambda\_1} \int\_{\lambda\_2} \cdots \int\_{\lambda\_{j-1}} \lambda\_1^{m\_1} \cdots \lambda\_{j-1}^{m\_{j-1}} \mathbf{e}^{-(\lambda\_1 + \cdots + \lambda\_{j-1})} \mathbf{d}\lambda\_1 \wedge \dots \wedge \mathbf{d}\lambda\_{j-1}$$

$$=\sum\_{\mu\_1=0}^{m\_1} \frac{m\_1!}{(m\_1-\mu\_1)!} \sum\_{\mu\_2=0}^{m\_1-\mu\_1+m\_2} \frac{(m\_1-\mu\_1+m\_2)!}{2^{\mu\_2+1}(m\_1-\mu\_1+m\_2-\mu\_2)!} \times \cdots$$

$$\times \sum\_{\mu\_{j-1}=0}^{m\_1-\mu\_1+\cdots+m\_{j-1}} \frac{(m\_1-\mu\_1+\cdots+m\_{j-1})!}{(j-1)^{\mu\_{j-1}+1}(m\_1-\mu\_1+\cdots+m\_{j-1}-\mu\_{j-1})!} \lambda\_j^{m\_1-\mu\_1+\cdots+m\_{j-1}-\mu\_{j-1}} \mathbf{e}^{-j\lambda\_j}$$

$$\equiv \phi\_{j-1}(\lambda\_j). \tag{ii}$$

Hence, the following result:

**Theorem 9.6.1.** *When mj* <sup>=</sup> *<sup>α</sup>* <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> + *p* − *kj is a positive integer, where mj is defined in (9.6.10), the marginal density of the smallest eigenvalue λp of the p* × *p real gamma distributed matrix with parameters (α, I ), denoted by f*1*p(λp), is the following:*

$$\begin{split} f\_{1}(\lambda\_{p}) \mathrm{d}\lambda\_{p} \\ &= c\_{K} \Phi\_{p-1}(\lambda\_{p}) \, \lambda\_{p}^{m\_{p}} \, \mathsf{e}^{-\lambda\_{p}} \\ &= c\_{K} \sum\_{\mu\_{1}=0}^{m\_{1}} \frac{m\_{1}!}{(m\_{1}-\mu\_{1})!} \sum\_{\mu\_{2}=0}^{m\_{1}-\mu\_{1}+m\_{2}} \frac{(m\_{1}-\mu\_{1}+m\_{2})!}{2^{\mu\_{2}+1}(m\_{1}-\mu\_{1}+m\_{2}-\mu\_{2})!} \cdots \sum\_{\mu\_{p-1}=0}^{m\_{1}-\mu\_{1}+\cdots+m\_{p-1}} \\ &\times \frac{(m\_{1}-\mu\_{1}+\cdots+m\_{p-1})!}{(p-1)^{\mu\_{p-1}+1}(m\_{1}-\mu\_{1}+\cdots+m\_{p-1}-\mu\_{p-1})!} \lambda\_{p-1}^{m\_{1}-\mu\_{1}+\cdots+m\_{p-1}-\mu\_{p-1}+m\_{p}} \mathsf{e}^{-\mu\_{\lambda}} \mathsf{d}\lambda\_{p} \end{split} \tag{9.6.11}$$

*for* 0 *< λp <* ∞*, where*

$$c\_K = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})\Gamma\_p(\alpha)} \sum\_K (-1)^{\rho\_K},$$

*φj*−<sup>1</sup>*(λj ) is specified in (ii) and K* = *(k*1*,...,kp) is defined in (9.6.9).*

In the complex *p* × *p* matrix-variate gamma case, *rj* is as defined in (9.6a.3) and the expression for *φj*−<sup>1</sup>*(λj )* given in *(ii)* remains the same with the exception that *mj* in the complex case is *mj* = *α*−*p*+*rj* . Then, in the complex domain, the density of *λp*, denoted by *f*˜ <sup>1</sup>*p(λp)*, is the following:

**Theorem 9.6a.1.** *When mj* = *α* − *p* + *rj is a positive integer, where rj is defined in (9.6a.3), the density of the smallest eigenvalue of the complex matrix-variate gamma distribution is the following:*

$$\begin{split} f\_{1}(\lambda\_{p}) \mathrm{d}\lambda\_{p} \\ &= \tilde{c}\_{K} \Phi\_{p-1}(\lambda\_{p}) \,\lambda\_{p}^{m\_{p}} \,\mathsf{c}^{-p\lambda\_{p}} \\ &= \tilde{c}\_{K} \sum\_{\mu\_{1}=0}^{m\_{1}} \frac{m\_{1}!}{(m\_{1}-\mu\_{1})!} \sum\_{\mu\_{2}=0}^{m\_{1}-\mu\_{1}+m\_{2}} \frac{(m\_{1}-\mu\_{1}+m\_{2})!}{2^{\mu\_{2}+1}(m\_{1}-\mu\_{1}+m\_{2}-\mu\_{2})!} \cdot \cdots \sum\_{\mu\_{p-1}=0}^{m\_{1}-\mu\_{1}+\cdots+m\_{p-1}} \\ &\times \frac{(m\_{1}-\mu\_{1}+\cdots+m\_{p-1})!}{(p-1)^{\mu\_{p-1}+1}(m\_{1}-\mu\_{1}+\cdots+m\_{p-1}-\mu\_{p-1})!} \cdot \lambda\_{p-1}^{m\_{1}-\mu\_{1}+\cdots+m\_{p-1}-\mu\_{p-1}+m\_{p}} \mathbf{e}^{-\lambda\_{p}} \mathbf{d}\lambda\_{p} \end{split} \tag{9.6a.5}$$

*for* 0 *< λp <* ∞*, where*

$$\tilde{c}\_K = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)\tilde{\Gamma}\_p(\alpha)} \sum\_K (-1)^{\rho\_K} \sum\_{r\_1,\dots,r\_k} \cdot \mathbf{1}$$

**Note 9.6.1.** In the complex Wishart case, *α* = *m* where *m* is the number of degrees of freedom, which is a positive integer. Hence, Theorem 9.6a.1 gives the final result in the general case for that distribution. One can obtain the joint density of the *p*−*j* +1 smallest eigenvalues from *φj*−<sup>1</sup>*(λj )* as defined in *(ii)*, both in the real and complex cases. If the scale parameter matrix of the gamma distribution is of the form *aI* where *a >* 0 is a real scalar and *I* is the identity matrix, then the distributions of the eigenvalues can also be obtained from the proposed procedure since for any square matrix *B*, the eigenvalues of *aB* are *a νj* 's where the *νj* 's are the eigenvalues of *B*. In the case of real Wishart distributions originating from a sample of size *n* from a *p*-variate Gaussian population

whose covariance matrix is the identity matrix, the *λj* 's are multiplied by the constant *<sup>a</sup>* <sup>=</sup> *<sup>n</sup>* <sup>2</sup> . If the Wishart matrix is not an estimator obtained from the sample values, they should only be multiplied by *<sup>a</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> in the real case, no multiplicative constant being necessary in the complex Wishart case.

### **9.6.5. Density of the largest eigenvalue** *λ*<sup>1</sup> **in the real matrix-variate gamma case**

Consider the case of *mj* <sup>=</sup> *<sup>α</sup>* <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> + *p* − *kj* being an integer first. Then, in the real case, *mj* is as defined in (9.6.10). One has to integrate out *λp,...,λ*<sup>2</sup> in order to obtain the marginal density of *λ*1. As the initial step, consider the integration of *λp*, that is,

$$\begin{split} \text{Step 1 integral}: \int\_{\lambda\_p=0}^{\lambda\_{p-1}} \lambda\_p^{m\_p} \mathbf{e}^{-\lambda\_p} \mathbf{d} \lambda\_p \\ &= \left[ -\lambda\_p^{m\_p} \mathbf{e}^{-\lambda\_p} \right]\_0^{\lambda\_p - 1} + \dots + \left[ -m\_P! \mathbf{e}^{-\lambda\_p} \right]\_0^{\lambda\_p - 1} \\ &= m\_P! - \sum\_{\nu\_p = 0}^{m\_p} \frac{m\_P!}{(m\_p - \nu\_p)!} \lambda\_{p-1}^{m\_p - \nu\_p} \mathbf{e}^{-\lambda\_{p-1}}. \end{split} \tag{i}$$

Now, multilying each term by *λ mp*−<sup>1</sup> *<sup>p</sup>*−<sup>1</sup> <sup>e</sup>−*λp*−<sup>1</sup> and integrating by parts, we have the second step integral:

Step 2 integral

$$\begin{split} &=m\_{P}!\int\_{\lambda\_{P-1}=0}^{\lambda\_{P-2}}\lambda\_{p-1}^{m\_{P-1}}\mathsf{e}^{-\lambda\_{P-1}}\mathsf{d}\lambda\_{p-1}-\sum\_{\nu\_{P}=0}^{m\_{P}}\frac{m\_{P}!}{(m\_{p}-\nu\_{p})!}\int\_{\lambda\_{P-1}=0}^{\lambda\_{P-2}}\lambda\_{p-1}^{m\_{p}-\nu\_{p}+m\_{P-1}}\mathsf{e}^{-2\lambda\_{P-1}}\mathsf{d}\lambda\_{p-1} \\ &=m\_{P}!m\_{P-1}!-m\_{P}!\sum\_{\nu\_{P}=0}^{m\_{P-1}}\frac{m\_{P-1}!}{(m\_{P-1}-\nu\_{p-1})!}\lambda\_{p-2}^{m\_{P-1}-\nu\_{p-1}}\mathsf{e}^{-\lambda\_{P-2}} \\ &-\sum\_{\nu\_{P}=0}^{m\_{P}}\frac{m\_{P}!}{(m\_{p}-\nu\_{p})!}\frac{(m\_{p}-\nu\_{p}+m\_{P-1})!}{2^{m\_{p}-\nu\_{p}+m\_{P-1}}} \\ &+\sum\_{\nu\_{P}=0}^{m\_{P}}\frac{m\_{P}!}{(m\_{p}-\nu\_{p})!}\sum\_{\nu\_{P}=1}^{m\_{P}-\nu\_{p}+m\_{P-1}}\frac{(m\_{p}-\nu\_{p}+m\_{P-1})!}{2^{\nu\_{P}-1}(m\_{P}-\nu\_{p}+m\_{P-1}-\nu\_{p-1})!}\lambda\_{p-2}^{m\_{p}-\nu\_{p}+m\_{P-1}-\nu\_{p-1}}\mathsf{e}^{-2\lambda\_{P-2}}.\end{split}$$

At the *j* -th step of integration, there will be 2*<sup>j</sup>* terms of which <sup>2</sup>*<sup>j</sup>* <sup>2</sup> <sup>=</sup> <sup>2</sup>*j*−<sup>1</sup> will be positive and 2*j*−<sup>1</sup> will be negative. All the terms at the *j* -th step can be generated by 2*<sup>j</sup>* sequences of zeros and ones. Each sequence (each row) consists of *j* positions with a zero or one occupying each position. Depending on whether the number of ones in a sequence is odd or even, the corresponding term will start with a minus or a plus sign, respectively. The following are the sequences, where the last column indicates the sign of the term:

Step 1: <sup>0</sup> <sup>+</sup> <sup>1</sup> <sup>−</sup> *,* Step 2: 0 0 + 0 1 − 1 0 − 1 1 + *,* Step 3: 000 + 001 − 010 − 011 + *,* Step 3: 100 − 101 + 110 + 111 − Step 4: 0000 + 0001 − 0010 − 0011 + *,* 0100 − 0101 + 0110 + 0111 − *,* 1000 − 1001 + 1010 + 1011 − *,* 1100 + 1101 − 1110 − 1111 + *.*

All the terms at the *j* -th step can be written down by using the following rules:

(1): If the first entry in a sequence (row) is zero, then the corresponding factor in the term is *mp*! ;

(2): If the first entry in a sequence is 1, then the corresponding factor in the term is *mp νp*=0 *mp*! *(mp*−*νp)*! or this sum multiplied by *<sup>λ</sup> mp*−*νp <sup>p</sup>*−<sup>1</sup> <sup>e</sup>−*λp*−<sup>1</sup> if this 1 is the last entry in the sequence;

(3): If the *r*-th and *(r*−1*)*-th entries in the sequence both equal zero, then the corresponding factor in the term is *mp*−*r*+1! ;

(4): If the *r*-th entry in the sequence is zero and the *(r* − 1*)*-th entry is 1, then the corresponding factor in the term is *(nr*−1+*mp*−*r*+1*)*! *(ηr*−1+1*) nr*−1+*mp*−*r*+1+<sup>1</sup> , where *nr*−<sup>1</sup> is the argument of the denominator factorial and *(ηr*−<sup>1</sup>*)νp*−*r*+2+<sup>1</sup> is the factor in the denominator corresponding to the *(r* − 1*)*-th entry;

(5): If the *r*-th entry in the sequence is 1 and the *(r* − 1*)*-th entry is zero, then the corresponding factor in the term is *mp*−*r*+<sup>1</sup> *νp*−*r*+1=0 *mp*−*r*+1! *(mp*−*r*+1−*νp*−*r*+1*)*! or this sum multiplied by *λ mp*−*r*+1−*νp*−*r*+<sup>1</sup> *<sup>p</sup>*−*<sup>r</sup>* <sup>e</sup>−*λp*−*<sup>r</sup>* if this 1 is the last entry in the sequence;

(6): If the *r*-th and *(r* − 1*)*-th entries in the sequence are both equal to 1, then the corresponding factor in the term is *nr*−1+*mp*−*r*+<sup>1</sup> *νp*−*r*+1=0 *(nr*−1+*mp*−*r*+1*)*! *(ηr*−1+1*) νp*−*r*+<sup>1</sup> *(nr*−1+*mp*−*r*+1−*νp*−*r*+1*)*! or this sum multiplied by *λ nr*−1+*mp*−*r*+1−*νp*−*r*+<sup>1</sup> *<sup>p</sup>*−*<sup>r</sup>* <sup>e</sup>−*(ηr*−1+1*)λp*−*<sup>r</sup>* if this 1 is the last entry in the sequence, where *nr*−<sup>1</sup> and *ηr*−<sup>1</sup> are defined in rule (4). By applying the above rules, let us write down the terms at step 3, that is, *j* = 3. The sequences are then the following:

$$\begin{array}{ccccccccc} & 0 & 0 & 0 & + & 1 & 0 & 0 & -\\ \text{Step 3 sequences:} & 0 & 0 & 1 & - & 1 & 0 & 1 & +\\ 0 & 1 & 0 & - & & 1 & 1 & 0 & +\\ & 0 & 1 & 1 & + & & 1 & 1 & 1 & - \end{array} \tag{iii}$$

The corresponding terms in the order of the sequences are the following: Step 3 integral

= *mp*! *mp*−1! *mp*−2! − *mp*! *mp*−1! *m p*−2 *νp*−2=0 *mp*−2! *(mp*−<sup>2</sup> − *νp*−2*)*! *λ mp*−2−*νp*−<sup>2</sup> *<sup>p</sup>*−<sup>3</sup> <sup>e</sup>−*λp*−<sup>3</sup> − *mp*! *m p*−1 *νp*−1=0 *mp*−1! *(mp*−<sup>1</sup> − *νp*−1*)*! *(mp*−<sup>1</sup> − *νp*−<sup>1</sup> + *mp*−2*)*! 2*mp*−1−*νp*−1+*mp*−2+<sup>1</sup> + *mp*! *m p*−1 *νp*−1=0 *(mp*−1*)*! *(mp*−<sup>1</sup> − *νp*−1*)*! *mp*−1−*ν <sup>p</sup>*−1+*mp*−<sup>2</sup> *νp*−2=0 *(mp*−<sup>1</sup> − *νp*−<sup>1</sup> + *mp*−2*)*! <sup>2</sup>*νp*−2+<sup>1</sup>*(mp*−<sup>1</sup> <sup>−</sup> *νp*−<sup>1</sup> <sup>+</sup> *mp*−<sup>2</sup> <sup>−</sup> *νp*−2*)*! × *λ mp*−1−*νp*−1+*mp*−2−*νp*−<sup>2</sup> *<sup>p</sup>*−<sup>3</sup> <sup>e</sup>−2*λp*−<sup>3</sup> <sup>−</sup> *mp νp*=0 *mp*! *(mp* − *νp)*! *(mp* − *νp* + *mp*−1*)*! 2*mp*−*νp*+*mp*−<sup>1</sup> *mp*−2! <sup>+</sup> *mp νp*=0 *mp*! *(mp* − *νp)*! *(mp* − *νp* + *mp*−1*)*! 2*mp*−*νp*+*mp*−1+<sup>1</sup> *m p*−2 *νp*−2=0 *mp*−2! *(mp*−<sup>2</sup> − *νp*−2*)*! *λ mp*−2−*νp*−<sup>2</sup> *<sup>p</sup>*−<sup>3</sup> <sup>e</sup>−*λp*−<sup>3</sup> <sup>+</sup> *mp νp*=0 *mp*! *(mp* − *νp)*! *mp*−*ν <sup>p</sup>*+*mp*−<sup>1</sup> *νp*−1=0 *(mp* − *νp* + *mp*−1*)*! *(mp* − *νp* + *mp*−<sup>1</sup> − *νp*−1*)*! × *(mp* − *νp* + *mp*−<sup>1</sup> − *νp*−<sup>1</sup> + *mp*−2*)*! 3*mp*−*νp*+*mp*−1−*νp*−1+*mp*−<sup>2</sup> <sup>−</sup> *mp νp*=0 *mp*! *(mp* − *νp)*! *mp*−*ν <sup>p</sup>*+*mp*−<sup>1</sup> *νp*−1=0 *(mp* − *νp* + *mp*−1*)*! <sup>2</sup>*νp*−1+<sup>1</sup>*(mp* <sup>−</sup> *νp* <sup>+</sup> *mp*−<sup>1</sup> <sup>−</sup> *νp*−1*)*! × *mp*−*νp*+*mp* <sup>−</sup>1−*νp*−1+*mp*−<sup>2</sup> *νp*−2=0 *(mp* − *νp* + *mp*−<sup>1</sup> − *νp*−<sup>1</sup> + *mp*−2*)*! <sup>3</sup>*νp*−2+<sup>1</sup>*(mp* <sup>−</sup> *νp* <sup>+</sup> *mp*−<sup>1</sup> <sup>−</sup> *νp*−<sup>1</sup> <sup>+</sup> *mp*−<sup>2</sup> <sup>−</sup> *νp*−2*)*! × *λ mp*−*νp*+*mp*−1−*νp*−1+*mp*−2−*νp*−<sup>2</sup> *<sup>p</sup>*−<sup>3</sup> <sup>e</sup>−3*λp*−<sup>3</sup> *.* (*iv*)

The terms in *(iv)* are to be multiplied by *λ mp*−<sup>3</sup> *<sup>p</sup>*−<sup>3</sup> <sup>e</sup>−*λp*−<sup>3</sup> to obtain the final result if we are stopping, that is, if *p* = 4. The terms in *(iv)* can be verified by multiplying the step 2 integral by *λ mp*−<sup>2</sup> *<sup>p</sup>*−<sup>2</sup> <sup>e</sup>−*λp*−<sup>2</sup> and then integrating *(ii)*, term by term. Denoting the sum of the terms at the *j* -th step by *ψj (λp*−*<sup>j</sup> )*, the density of the largest eigenvalue *λ*1, denoted by *f*11*(λ*1*)*, is the following:

**Theorem 9.6.2.** *When mj* <sup>=</sup> *<sup>α</sup>* <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> + *p* − *kj is a positive integer and ψj (λp*−*<sup>j</sup> ) is as defined in the preceding paragraph, where kj is given in (9.6.10), the density of λ*<sup>1</sup> *in the real matrix-variate gamma case, denoted by f*11*(λ*1*), is the following:*

$$\Gamma\_{\rm I} f\_{\rm I1}(\lambda\_1) \mathrm{d}\lambda\_{\rm I} = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\alpha)\Gamma\_p(\frac{p}{2})} \sum\_{\mathbf{K}} (-1)^{\rho\_{\mathbf{k}}} \psi\_{p-1}(\lambda\_1) \lambda\_{\mathbf{l}}^{m\_{\mathbf{l}}} \mathbf{e}^{-\lambda\_{\mathbf{l}}} \, \mathrm{d}\lambda\_{\mathbf{l}}, \ 0 < \lambda\_1 < \infty, \tag{9.6.12}$$

*where it is assumed that mj is a positive integer.*

In the corresponding complex case, the procedure is parallel and the expression for *ψj (λp*−*<sup>j</sup> )* remains the same, except that *mj* will then be equal to *α* − *p* + *rj* , where *rj* is defined in (9.6a.3). Assuming that *mj* is a positive integer and letting the density in the complex case be denoted by *f*˜ <sup>11</sup>*(λ*1*)*, we have the following result:

**Theorem 9.6a.2.** *Letting mj* = *α* − *p* + *rj be a positive integer and ψj (λp*−*<sup>j</sup> ) have the same representation as in the real case except that mj* = *α* − *p* + *rj , in the complex case, the density of λ*1*, denoted by f*˜ <sup>11</sup>*(λ*1*), is the following:*

$$\tilde{f}\_{11}(\lambda\_1) \mathrm{d}\lambda\_1 = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)\tilde{\Gamma}\_p(\alpha)} \sum\_{\mathbf{K}} (-1)^{\rho\_{\mathbf{K}}} \sum\_{r\_1,\ldots,r\_{\mathbf{k}}} \psi\_{p-1}(\lambda\_1) \lambda\_1^{m\_1} \mathbf{e}^{-\lambda\_1} \,\mathrm{d}\lambda\_1, \ 0 < \lambda\_1 < \infty. \tag{9.6a.6}$$

**Note 9.6.2.** One can also compute the density of the *j* -th eigenvalue *λj* from Theorems 9.6.1 and 9.6.2. For obtaining the density of *λj* , one has to integrate out *λ*1*,...,λj*−<sup>1</sup> and *λp, λp*−<sup>1</sup>*,...,λj*+1, the resulting expressions being available from the *(j* − 1*)* th step when integrating *λ*1*,...,λj*−<sup>1</sup> and from the *(p* − *j )*-th step when integrating *λp, λp*−<sup>1</sup>*,...,λj*+1.

### **9.6.6. Density of the largest eigenvalue** *λ*<sup>1</sup> **in the general real case**

By general case, it is meant that *mj* <sup>=</sup> *<sup>α</sup>* <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> +*p*−*kj* is not a positive integer. In the real Wishart case, *mj* will then be a half-integer; however in the general gamma case *α* can be any real number greater than *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> . In this general case, we will expand the exponential part and then integrate term by term. That is,

Arak M. Mathai, Serge B. Provost, Hans J. Haubold

$$\text{Step 1 integral: } \int\_{\lambda\_P=0}^{\lambda\_P=1} \lambda\_p^{m\_p} \mathbf{e}^{-\lambda\_P} \mathbf{d} \lambda\_p = \sum\_{\nu\_p=0}^{\infty} \frac{(-1)^{\nu\_p}}{\nu\_p!} \int\_{\lambda\_P=0}^{\lambda\_P-1} \lambda\_p^{m\_p+\nu\_p} \mathbf{d} \lambda\_p$$

$$= \sum\_{\nu\_p=0}^{\infty} \frac{(-1)^{\nu\_p}}{\nu\_p!} \frac{1}{(m\_p+\nu\_p+1)} \lambda\_{p-1}^{m\_p+\nu\_p+1}. \tag{i}$$

Continuing this process, we have

$$\begin{split} \text{Step } j \text{ integral } &= \sum\_{\nu\_p=0}^{\infty} \frac{(-1)^{\nu\_p}}{\nu\_p!} \frac{1}{m\_p + \nu\_p + 1} \sum\_{\nu\_{p-1}=0}^{\infty} \frac{(-1)^{\nu\_{p-1}}}{\nu\_{p-1}!} \frac{1}{m\_p + \nu\_p + m\_{p-1} + \nu\_{p-1} + 2} \\ &\cdots \sum\_{\nu\_{p-j+1}=0}^{\infty} \frac{(-1)^{\nu\_{p-j+1}}}{\nu\_{p-j+1}!} \frac{1}{m\_p + \nu\_p + \dots + m\_{p-j+1} + \nu\_{p-j+1} + j} \\ &\equiv \Delta\_j(\lambda\_{p-j}). \end{split} \tag{ii}$$

Then, in the general real case, the density of *λ*1, denoted by *f*21*(λ*1*)*, is the following:

**Theorem 9.6.3.** *When mj* <sup>=</sup> *<sup>α</sup>* <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> + *p* − *kj is not a positive integer, where kj is as specified in (9.6.10), the density of the largest eigenvalue λ*<sup>1</sup> *in the general real matrixvariate gamma case, denoted by f*21*(λ*1*), is given by*

$$f\_{21}(\lambda\_1) \text{d}\lambda\_1 = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})\Gamma\_p(a)} \sum\_{\mathbf{K}} (-1)^{\rho\_\mathbf{K}} \Delta\_{p-1}(\lambda\_1) \lambda\_1^{m\_1} \mathbf{e}^{-\lambda\_1} \text{d}\lambda\_1, \ 0 < \lambda\_1 < \infty,\tag{9.6.13}$$

*where Δj (λp*−*<sup>j</sup> ) is defined in (ii).*

The corresponding density of *λ*<sup>1</sup> in the general situation of the complex matrix-variate gamma distribution is given in the next theorem. Observe that in the complex Wishart case, *mj* is an integer and hence there is no general case to consider.

**Theorem 9.6a.3.** *When mj* = *α* − *p* + *rj is not a positive integer, where rj is as defined in (9.6a.3), the density of λ*<sup>1</sup> *in the complex case, denoted by f*˜ <sup>21</sup>*(λ*1*), is given by*

$$\begin{split} \tilde{f}\_{21}(\lambda\_1) \mathrm{d}\lambda\_1 = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)\tilde{\Gamma}\_p(\alpha)} \sum\_{\mathbf{K}} (-1)^{\rho\_{\mathbf{K}}} \sum\_{r\_1,\ldots,r\_p} \Delta\_{p-1}(\lambda\_1) \,\lambda\_1^{m\_1} \mathbf{e}^{-\lambda\_1} \mathrm{d}\lambda\_1, \; 0 < \lambda\_1 < \infty, \\\ &\tag{9.6a.7} \end{split} \tag{9.6a.7}$$

*where the Δj (λp*−*<sup>j</sup> ) has the representation specified in (ii) except that mj* = *α* − *p* + *rj .*

### **9.6.7. Density of the smallest eigenvalue** *λp* **in the general real case**

Once again, 'general case' is understood to mean that *mj* <sup>=</sup> *<sup>α</sup>* <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> + *p* − *kj* is not a positive integer, where *kj* is defined in (9.6.10). For the real Wishart distribution, 'general case' corresponds to *mj* being a half-integer. In order to determine the density of the smallest eigenvalue, we will integrate out *λ*1*,...,λp*−1. We initially evaluate the following integral:

$$\begin{aligned} \text{Step 1 integral: } \int\_{\lambda\_1 = \lambda\_2}^{\infty} \lambda\_1^{m\_1} \mathbf{e}^{-\lambda\_1} \mathbf{d} \lambda\_1 &= \varGamma(m\_1 + 1) - \int\_{\lambda\_1 = 0}^{\lambda\_2} \lambda\_1^{m\_1} \mathbf{e}^{-\lambda\_1} \mathbf{d} \lambda\_1 \\ &= \varGamma(m\_1 + 1) - \sum\_{\mu\_1 = 0}^{\infty} \frac{(-1)^{\mu\_1}}{\mu\_1!} \frac{1}{m\_1 + \mu\_1 + 1} \lambda\_2^{m\_1 + \mu\_1 + 1} . \end{aligned} \tag{i}$$

The second step consists of integrating out *λ*<sup>2</sup> from the expression obtained in (*i*) multiplied by *λm*<sup>2</sup> <sup>2</sup> <sup>e</sup>−*λ*<sup>2</sup> :

Step 2 integral:

$$\begin{split} &\Gamma(m\_{1}+1)\int\_{\lambda\_{2}^{2}=\lambda\_{3}^{\*}}^{\infty}\lambda\_{2}^{m\_{2}}\mathbf{e}^{-\lambda\_{2}}\mathrm{d}\lambda\_{2} - \sum\_{\mu\_{1}=0}^{\infty}\frac{(-1)^{\mu\_{1}}}{\mu\_{1}!}\frac{1}{m\_{1}+\mu\_{1}+1}\int\_{\lambda\_{2}^{2}=\lambda\_{3}}^{\infty}\lambda\_{2}^{m\_{1}+\mu\_{1}+m\_{2}+1}\mathbf{e}^{-\lambda\_{2}}\mathrm{d}\lambda\_{2} \\ &=\Gamma(m\_{1}+1)\Gamma(m\_{2}+1)-\Gamma(m\_{1}+1)\sum\_{\mu\_{2}=0}^{\infty}\frac{(-1)^{\mu\_{2}}}{\mu\_{2}!}\frac{1}{m\_{2}+\mu\_{2}+1}\lambda\_{3}^{m\_{2}+\mu\_{2}+1} \\ &-\sum\_{\mu\_{1}=0}^{\infty}\frac{(-1)^{\mu\_{1}}}{\mu\_{1}!(m\_{1}+\mu\_{1}+1)}\Gamma(m\_{1}+\mu\_{1}+m\_{2}+2) \\ &+\sum\_{\mu\_{1}=0}^{\infty}\frac{(-1)^{\mu\_{1}}}{\mu\_{1}!}\frac{1}{m\_{1}+\mu\_{1}+1}\sum\_{\mu\_{2}=0}^{\infty}\frac{(-1)^{\mu\_{2}}}{\mu\_{2}!}\frac{\lambda\_{3}^{m\_{1}+\mu\_{1}+m\_{2}+\mu\_{2}+2}}{m\_{1}+\mu\_{1}+m\_{2}+\mu\_{2}+2}. \end{split}$$

A pattern is now seen to emerge. At step *j,* there will be 2*<sup>j</sup>* terms, of which 2*j*−<sup>1</sup> will start with a plus sign and 2*j*−<sup>1</sup> will start with a minus sign. All the terms at the *j* -th step are available from the 2*<sup>j</sup>* sequences of zeros and ones provided in Sect. 9.6.5. The terms can be written down by utilizing the following rules:

(1): If the sequence starts with a zero, then the corresponding factor in the term is *Γ (m*<sup>1</sup> + 1*)*;

(2): If the sequence starts with a 1, then the corresponding factor in the term is <sup>∞</sup> *μ*1=0 *(*−1*)μ*<sup>1</sup> *μ*1! 1 *<sup>m</sup>*1+*μ*1+<sup>1</sup> or this series multiplied by *<sup>λ</sup>m*1+*μ*1+<sup>1</sup> <sup>2</sup> if this 1 is the last entry in the sequence;

(3): If the *r*-th entry in the sequence is a zero and the *(r* − 1*)*-th entry in the sequence is also zero, then the corresponding factor in the term is *Γ (mr* + 1*)*;

*.*

(4): If the *r*-th entry in the sequence is a zero and the *(r* − 1*)*-th entry in the sequence is a 1, then the corresponding factor in the term is *Γ (nr*−<sup>1</sup> + *mr* + 1*)* where *nr*−<sup>1</sup> is the denominator factor in the *(r* − 1*)*-th factor excluding the factorial;

(5): If the *r*-th entry in the sequence is 1 and the *(r* − 1*)*-th entry is zero, then the corresponding factor in the term is <sup>∞</sup> *μr*=0 *(*−1*)μ*<sup>1</sup> *μr*! 1 *mr*+*μr*+<sup>1</sup> or this series multiplied by *<sup>λ</sup>mr*+*μr*+<sup>1</sup> *r*+1 if this 1 happens to be the last entry in the sequence;

(6): If the *r*-th entry in the sequence is 1 and the *(r* − 1*)*-th entry is also 1, then the corresponding factor in the term is <sup>∞</sup> *μr*=0 *(*−1*)μr μr*! 1 *nr*−1+*mr*+*μr*+<sup>1</sup> or this series multiplied by *λnr*−1+*mr*+*μr*+<sup>1</sup> *<sup>r</sup>*+<sup>1</sup> if this 1 is the last entry in the sequence, where *nr*−<sup>1</sup> is the factor appearing in the denominator of the *(r* − 1*)*-th factor excluding the factorial.

These rules enable one to write down all the terms at any step. For example for *j* = 3, that is, at the third step, the terms are available from the following step 3 sequences:


The terms corresponding to the sequences in the order are the following:

$$\begin{split} & \text{Step 3 integral} \\ &= \Gamma(m\_1+1)\Gamma(m\_2+1)\Gamma(m\_3+1) \\ & \quad -\Gamma(m\_1+1)\Gamma(m\_2+1) \sum\_{\mu\_3=0}^{\infty} \frac{(-1)^{\mu\_3}}{\mu\_3!} \frac{1}{m\_3+\mu\_3+1} \lambda\_4^{m\_3+\mu\_3+1} \\ & \quad -\Gamma(m\_1+1) \sum\_{\mu\_2=0}^{\infty} \frac{(-1)^{\mu\_2}}{\mu\_2!(m\_2+\mu\_2+1)} \Gamma(m\_2+\mu\_2+m\_3+2) \\ & \quad +\Gamma(m\_1+1) \sum\_{\mu\_2=0}^{\infty} \frac{(-1)^{\mu\_2}}{\mu\_2!} \frac{1}{m\_2+\mu\_2+1} \sum\_{\mu\_3=0}^{\infty} \frac{(-1)^{\mu\_3}}{\mu\_3!} \frac{\lambda\_4^{m\_2+\mu\_2+m\_3+\mu\_3+2}}{m\_2+\mu\_2+m\_3+\mu\_3+2} \\ & \quad -\sum\_{\mu\_1=0}^{\infty} \frac{(-1)^{\mu\_1}}{\mu\_1!(m\_1+\mu\_1+1)} \Gamma(m\_1+\mu\_1+m\_2+2) \Gamma(m\_3+1) \\ & \quad +\sum\_{\mu\_1=0}^{\infty} \frac{(-1)^{\mu\_1}}{\mu\_1!(m\_1+\mu\_1+1)} \Gamma(m\_1+\mu\_1+m\_2+2) \sum\_{\mu\_3=0}^{\infty} \frac{(-1)^{\mu\_3}}{\mu\_3!} \frac{\lambda\_4^{m\_3+\mu\_3+1}}{(m\_3+\mu\_3+1)} \end{split}$$

Principal Component Analysis

$$\begin{split} &+\sum\_{\mu\_{1}=0}^{\infty} \frac{(-1)^{\mu\_{1}}}{\mu\_{1}!(m\_{1}+\mu\_{1}+1)} \sum\_{\mu\_{2}=0}^{\infty} \frac{(-1)^{\mu\_{2}}}{\mu\_{2}!(m\_{1}+\mu\_{1}+m\_{2}+\mu\_{2}+2)} \\ & \qquad \times \Gamma(m\_{1}+\mu\_{1}+m\_{2}+\mu\_{2}+m\_{3}+3) \\ &-\sum\_{\mu\_{1}=0}^{\infty} \frac{(-1)^{\mu\_{1}}}{\mu\_{1}!(m\_{1}+\mu\_{1}+1)} \sum\_{\mu\_{2}=0}^{\infty} \frac{(-1)^{\mu\_{2}}}{\mu\_{2}!(m\_{1}+\mu\_{1}+m\_{2}+\mu\_{2}+2)} \\ & \qquad \times \sum\_{\mu\_{3}=0}^{\infty} \frac{(-1)^{\mu\_{3}}}{\mu\_{3}!} \frac{\lambda\_{4}^{m\_{1}+\mu\_{1}+m\_{2}+\mu\_{2}+m\_{3}+\mu\_{3}+3}}{(m\_{1}+\mu\_{1}+m\_{2}+\mu\_{2}+m\_{3}+\mu\_{3}+3)}. \tag{ili} \end{split}$$

Then, the step *j* will be the following, denoted by *wj (λj*+1*)*:

$$w\_j(\lambda\_{j+1}) = \Gamma(m\_1 + 1) \cdots \Gamma(m\_j + 1) - \cdots + (-1)^j \sum\_{\mu\_1 = 0}^{\infty} \frac{(-1)^{\mu\_1}}{\mu\_1! (m\_1 + \mu\_1 + 1)} \cdots$$

$$\times \sum\_{\mu\_j = 0}^{\infty} \frac{(-1)^{\mu\_j}}{\mu\_j!} \frac{\lambda\_{j+1}^{m\_1 + \mu\_1 + \cdots + m\_j + \mu\_j + j}}{(m\_1 + \mu\_1 + \cdots + m\_j + \mu\_j + j)}. \tag{iv}$$

**Theorem 9.6.4.** *The density of λp for the general real matrix-variate gamma distribution, denoted by f*2*p(λp), is the following:*

$$f\_{2p}(\lambda\_p) \text{d}\lambda\_p = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})\Gamma\_p(\alpha)} \sum\_K (-1)^{\rho\_K} \, w\_{p-1}(\lambda\_p) \, \lambda\_p^{m\_p} \, \text{e}^{-\lambda\_p} \, \text{d}\lambda\_p, \ 0 < \lambda\_p < \infty,\tag{9.6.14}$$

*where the wj (λj*+1*) is defined in (iv).*

The corresponding distribution of *λp* for a general complex matrix-variate gamma distribution, denoted by *f*˜ <sup>2</sup>*p(λp)*, is the following:

**Theorem 9.6a.4.** *In the general complex case, in which instance mj* = *α* −*p* +*rj is not a positive integer, rj being as defined in (9.6a.3), the density of the smallest eigenvalue λp, denoted by f*˜ <sup>2</sup>*p(λp), is given by*

$$\tilde{f}\_{\mathbf{\tilde{p}}}(\lambda\_p)\mathrm{d}\lambda\_p = \frac{\pi^{p(p-1)}}{\tilde{\Gamma}\_p(p)\tilde{\Gamma}\_p(\alpha)}\sum\_{\mathbf{K}} (-1)^{\rho\_{\mathbf{K}}} \sum\_{r\_1,\ldots,r\_p} w\_{p-1}(\lambda\_p) \,\lambda\_p^{\mathfrak{m}\_p} \,\mathbf{e}^{-\lambda\_p} \,\mathrm{d}\lambda\_p, \ 0 < \lambda\_p < \infty,\tag{9.6a.8}$$

*where wj (λj*+1*) has the representation given in (iv) above for the real case, except that mj* = *α* − *p* + *rj .*

**Note 9.6.3.** In the complex Wishart case, *α* = *m* where *m>p* − 1 is the number of degrees of freedom, which is a positive integer. Hence, in this instance, one would simply apply Theorem 9.6a.1. It should also be observed that one can integrate out *λ*1*,...,λj*−<sup>1</sup> by using the procedure described in Theorem 9.6.4 and integrate out *λp,...,λj*+<sup>1</sup> by employing the procedure provided in Theorem 9.6.3, and thus derive the density of *λj* or the joint density of any set of successive *λj* 's. In a similar manner, one can obtain the density of *λj* or the joint density of any set of successive *λj* 's in the complex domain by making use of the procedures outlined in Theorems 9.6a.3 and 9.6a.4.

### **References**

M. Chiani (2014): Distribution of the largest eigenvalue for real Wishart and Gaussian random matrices and a simple approximation for the Tracy-Widom distribution, *Journal of Multivariate Analysis*, **129**, 69–81.

D. S. Clemm, A. K. Chattopadhyay and P. R. Krishnaiah (1973): Upper percentage points of the individual roots of the Wishart matrix, *Sankhya, Series B*, **35**(3), 325–338.

A. W. Davis (1972): On the marginal distributions of the latent roots of the multivariate beta matrix, *The Annals of Mathematical Statistics*, **43**(5), 1664–1670.

A. Edelman (1991): The distribution and moments of the smallest eigenvalue of a random matrix of Wishart type, *Linear Algebra and its Applications*, **159**, 55–80.

A. T. James (1964): Distributions of matrix variates and latent roots derived from normal samples, *The Annals of Mathematical Statistics*, **35**, 475–501.

O. James and H.-N. Lee (2021): Concise probability distributions of eigenvalues of realvalued Wishart matrices, https://arxiv.org/ftp/arxiv/paper/1402.6757.pdf

I. M. Johnstone (2001): On the distribution of the largest eigenvalue in Principal Components Analysis, *The Annals of Statistics*, **29**(2), 295–327.

C. G. Khatri (1964): Distribution of the largest or smallest characteristic root under null hypothesis concerning complex multivariate normal populations. *The Annals of Mathematical Statistics*, **35**, 1807–1810

P. R. Krishnaiah, F.J. Schuurmannan and V.B. Waikar (1973): Upper percentage points of the intermediate roots of the manova matrix, *Sankhya, Ser.B*, **35**(3), 339–358

N. Kwak (2008): Principal component analysis based on L1-norm maximization, *IEEE Transaction on Pattern Analysis and Machine Intelligence*, **30**(9), 1672–1680.

A. M. Mathai (1997): *Jacobians of Matrix Transformations and Functions of Matrix Argument*, World Scientific Publishing, New York.

A. M. Mathai and H. J. Haubold (2017a): *Linear Algebra for Physicists and Engineers*, De Gruyter, Germany.

A. M. Mathai and H. J. Haubold (2017b): *Probability and Statistics for Physicists and Engineers*, De Gruyter, Germany.

A. M. Mathai, S. B. Provost and T. Hayakawa (1995): *Bilinear Forms and Zonal Polynomials*, Springer Lecture Notes, New York.

F. Nie and H. Huang (2016): Non-greedy L21-norm maximization for principal component analysis, *arXiv:1603*.08293v1[cs,LG] 28 March 2016.

K. C. S. Pillai (1964): On the distribution of the largest seven roots of a matrix in multivariate analysis, *Biometrika*, **51**(1/2), 270–275.

J. Shi, X. Zheng and W. Yang (2017): Survey on probabilistic models of low-rank matrix factorization, *Entropy*, **19**, 424, doi:10.3390/e19080424.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 10 Canonical Correlation Analysis**

#### **10.1. Introduction**

We will keep utilizing the same notations in this chapter. More specifically, lowercase letters *x, y,...* will denote real scalar variables, whether mathematical or random. Capital letters *X, Y, . . .* will be used to denote real matrix-variate mathematical or random variables, whether square or rectangular matrices are involved. A tilde will be placed above letters such as *x,*˜ *y,*˜ *X,* ˜ *Y*˜ to denote variables in the complex domain. Constant matrices will for instance be denoted by *A, B, C*. A tilde will not be used on constant matrices unless the point is to be stressed that the matrix is in the complex domain. The determinant of a square matrix *A* will be denoted by |*A*| or det*(A)* and, in the complex case, the absolute value or modulus of the determinant of *A* will be denoted as |det*(A)*|. When matrices are square, their order will be taken as *p* × *p*, unless specified otherwise. When *A* is a full rank matrix in the complex domain, then *AA*∗ is Hermitian positive definite where an asterisk designates the complex conjugate transpose of a matrix. Additionally, d*X* will indicate the wedge product of all the distinct differentials of the elements of the matrix *X*. Letting the *p* × *q* matrix *X* = *(xij )* where the *xij* 's are distinct real scalar variables, <sup>d</sup>*<sup>X</sup>* = ∧*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> <sup>∧</sup>*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>d</sup>*xij* . For the complex matrix *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup> *iX*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, where *<sup>X</sup>*<sup>1</sup> and *X*<sup>2</sup> are real, d*X*˜ = d*X*<sup>1</sup> ∧ d*X*2.

The necessary theory for the study of *Canonical Correlation Analysis* has already been introduced in Chap. 1, including the problem of optimizing a real bilinear form subject to two quadratic form constraints. This topic happens to be connected to the prediction problem. In regression analysis, the objective consists of seeking the best prediction function of a real scalar variable *y* based on a collection of preassigned real scalar variables *x*1*,...,xk*. It was previously determined that the regression of *y* on *x*1*,...,xk,* or the *best* predictor of *y* at preassigned values of *x*1*,...,xk,* is the conditional expectation of *y* at the specified values of *x*1*,...,xk*, that is, *E*[*y*|*x*1*,...,xk*] where *E* denotes the expected value. In this case, *best* is understood to mean 'in the minimum mean square' sense. Now,

<sup>©</sup> The Author(s) 2022

A. M. Mathai et al., *Multivariate Statistical Analysis in the Real and Complex Domains*, https://doi.org/10.1007/978-3-030-95864-0 10

consider the following generalization of this problem. Suppose that we wish to determine the best prediction function for a set of real scalar variables *y*1*,...,yq ,* on the basis of a collection of real scalar variables *x*1*,...,xp,* where *p* needs not be equal to *q*. Since individual variables are available from linear functions of those variables, we will convert the problem into one of predicting a linear function of *y*1*,...,yq* from an arbitrary linear function of *x*1*,...,xp,* and vice versa if we are interested in determining the association between two sets of variables. Let the linear functions be *u* = *α*1*x*<sup>1</sup> +···+ *αpxp* = *α*- *X* with *α*- = *(α*1*,...,αp)* and *X*- = *(x*1*,...,xp)* and *v* = *β*1*y*<sup>1</sup> + ··· + *βq yq* = *β*- *Y* with *β*- = *(β*1*,...,βq )* and *Y* - = *(y*1*,...,yq ),* where the coefficient vectors *α* and *β* are arbitrary. Let us provide an interpretion of *best predictor* in the case of two linear functions. As a criterion, we may make use of the maximum joint scatter, that is, the joint variation in *u* and *v* as measured by the covariance between *u* and *v* or, equivalently, the maximum scale-free covariance, namely, the correlation between *u* and *v*, and optimize this joint variation. Given the properties of linear functions of real scalar variables, we obtain the variances of linear functions and covariance between linear functions as follows: Var*(u)* = *α*- *Σ*11*α,* Var*(v)* = *β*- *Σ*22*β,* Cov*(u, v)* = *α*- *Σ*12*β* = *β*- *Σ*21*α*, *Σ*- <sup>12</sup> = *Σ*21, where *Σ*<sup>11</sup> *> O* and *Σ*<sup>22</sup> *> O* are the variance-covariance matrices of *X* and *Y* , respectively, and *Σ*<sup>12</sup> = *Σ*- <sup>21</sup> accounts for the covariance between *X* and *Y* . Letting the augmented vector *Z* = *X Y* and its associated covariance matrix be *Σ*, we have

$$
\Sigma = \text{Cov}\begin{bmatrix} X \\ Y \end{bmatrix} = \begin{bmatrix} \text{Cov}(X) & \text{Cov}(X,Y) \\ \text{Cov}(Y,X) & \text{Cov}(Y) \end{bmatrix} \equiv \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}.
$$

Our aim is to maximize *α*- *Σ*12*β* = *β*- *Σ*21*α*. When the coefficient vectors *α* and *β* are unrestricted, the optimization of *α*- *Σ*12*β* proves meaningless since the quantity *α*- *Σ*12*β* can vary from −∞ to ∞. Consequently, we impose the constraints, *α*- *Σ*11*α* = 1 and *β*- *Σ*22*β* = 1, to the coefficient vectors *α* and *β*. Accordingly, the mathematical problem consists of optimizing *α*- *Σ*12*β* subject to *α*- *Σ*11*α* = 1 and *β*- *Σ*22*β* = 1.

Letting

$$w = \alpha' \Sigma\_{12} \beta - \frac{\rho\_1}{2} (\alpha' \Sigma\_{11} \alpha - 1) - \frac{\rho\_2}{2} (\beta' \Sigma\_{22} \beta - 1) \tag{i}$$

where *ρ*<sup>1</sup> and *ρ*<sup>2</sup> are the Lagrangian multipliers, we differentiate *w* with respect to *α* and *β* and equate the resulting functions to null vectors. When differentiating with respect to *β*, we may utilize the equivalent form *β*- *Σ*21*α* = *α*- *Σ*12*β*. We then obtain the following equations:

Canonical Correlation Analysis

$$\frac{\partial}{\partial \alpha} w = O \Rightarrow \Sigma\_{12} \beta - \rho\_1 \Sigma\_{11} \alpha = O \tag{ii}$$

$$\frac{\partial}{\partial \beta} w = O \Rightarrow \Sigma\_{21} \alpha - \rho\_2 \Sigma\_{22} \beta = O. \tag{iiii}$$

On pre-multiplying *(ii)* by *α* and *(iii)*, by *β*- , and using the fact that *α*- *Σ*11*α* = 1 and *β*- *Σ*22*β* = 1, one has *ρ*<sup>1</sup> = *ρ*<sup>2</sup> ≡ *ρ* and *α*- *Σ*12*β* = *ρ*. Thus,

$$
\begin{bmatrix}
\Sigma\_{21} & -\rho \Sigma\_{22}
\end{bmatrix}
\begin{bmatrix}
\alpha \\ \beta
\end{bmatrix} = 
\begin{bmatrix}
O \\ O
\end{bmatrix} \tag{10.1.1}
$$

and

$$\text{Cov}(\alpha'X, \beta'Y) = \alpha' \Sigma\_{12} \beta = \beta' \Sigma\_{21} \alpha = \rho. \tag{10.1.2}$$

Hence, the maximum value of Cov*(α*- *X, β*- *Y )* yields the largest *ρ*. It follows from *(ii)* that *<sup>α</sup>* <sup>=</sup> <sup>1</sup> *ρ Σ*−<sup>1</sup> <sup>11</sup> *Σ*12*β* which, once substituted in *(iii)* yields

$$[\{\Sigma\_{21}\Sigma\_{11}^{-1}\Sigma\_{12} - \rho^2\Sigma\_{22}\}\beta = O \Rightarrow [\Sigma\_{22}^{-1}\Sigma\_{21}\Sigma\_{11}^{-1}\Sigma\_{12} - \rho^2I]\beta = O \dots$$

This entails that *<sup>ρ</sup>*<sup>2</sup> <sup>=</sup> *<sup>λ</sup>*, an eigenvalue of *<sup>B</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> <sup>11</sup> *Σ*<sup>12</sup> or its symmetrized form *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> 2 <sup>22</sup> , and that *β* is a corresponding eigenvector. Similarly, by obtaining a representation of *β* from *(iii)*, substituting it in *(ii)* and proceeding as above, it is seen that *<sup>ρ</sup>*<sup>2</sup> <sup>=</sup> *<sup>λ</sup>* is an eigenvalue of *<sup>A</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*<sup>21</sup> or its symmetrized form *Σ*−<sup>1</sup> 2 <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> , and that *α* is a corresponding eigenvector. Hence manifestly, all the nonzero eigenvalues of *A* coincide with those of *B*. If *p* ≤ *q* and *Σ*<sup>12</sup> is of full rank *p*, then *A>O* (real positive definite) and *B* ≥ *O* (real positive semi-definite), whereas if *q* ≤ *p* and *Σ*<sup>21</sup> is of full rank *p*, then *A* ≥ *O* (real positive semi-definite) and *B>O* (real positive definite). If *p* = *q* and *Σ*<sup>12</sup> is of full rank *p*, then *A* and *B* are both positive definite. If *p* ≤ *q* and *Σ*<sup>12</sup> is of full rank *p*, then one should start with *A* and compute all the *p* nonzero eigenvalues of *A* since *A* will be of lower order; on the other hand, if *q* ≤ *p* and *Σ*<sup>21</sup> is of full rank *q*, then one ought to begin with *B* and determine all the nonzero eigenvalues of *B*. Thus, one can obtain the common nonzero eigenvalues of *A* and *B* or their symmetrized forms by making use of one of these sets of steps. Let us denote the largest value of these common eigenvalues *<sup>λ</sup>* <sup>=</sup> *<sup>ρ</sup>*<sup>2</sup> by *λ(*1*)* and the corresponding eigenvectors with respect to *A* and *B*, by *α(*1*)* and *β(*1*),* where the eigenvectors are normalized via the constraints *α*- *(*1*) Σ*11*α(*1*)* = 1 and *β*- *(*1*) Σ*22*β(*1*)* = 1. Then, *(u*1*, v*1*)* ≡ *(α*- *(*1*) X, β*- *(*1*) Y )* is the first pair of canonical variables in the sense that *u*<sup>1</sup> is the best predictor of *v*<sup>1</sup> and *v*<sup>1</sup> is the best predictor of *u*1. Similarly, letting *ρ*<sup>2</sup> *(i)* = *λ(i)* be the *i*-th largest common eigenvalue of *A* and *B* and the corresponding eigenvectors such that *α*- *(i)Σ*11*α(i)* = 1 and *β*- *(i)Σ*22*β(i)* = 1*,* be denoted by *α(i)* and *β(i)*, the *i*-th largest correlation between *u* = *α*- *X* and *v* = *β*- *Y* will be equal to *α*- *(i)Σ*12*β(i)* = *ρ(i)* = *λ(i)*, *i* = 1*,...,p*, *p* denoting the common number of nonzero eigenvalues of *A* and *B*, and occur when *u* = *ui* = *α*- *(i)X* and *v* = *vi* = *β*- *(i)Y* , *ui* and *vi* being the *i*-th pair of canonical variables. Clearly, Var(*ui*) and Var(*vi*), *i* = 1*,... , p,* are both equal to one. Once again, best is taken to mean 'in the minimum mean square' sense. Hence, the following results:

**Theorem 10.1.1.** *Letting Σ, A, B, ρ, α(i), β(i), ui and vi be as previously defined,*

$$\max\_{\alpha' \Sigma\_{11} a = 1, \ \beta' \Sigma\_{22} \beta = 1} [\alpha' \Sigma\_{12} \beta] = \alpha'\_{(1)} \Sigma\_{12} \beta\_{(1)} = \rho\_{(1)} \tag{10.1.3}$$

*where ρ(*1*) is the largest ρ or the largest canonical correlation, that is, the largest correlation between the first pair of canonical variables, u* = *α*- *X and v* = *β*- *Y, which is equal to the correlation between u*<sup>1</sup> *and v*1*, with ρ*<sup>2</sup> *(*1*)* = *λ(*1*), the common largest eigenvalue of A and B. Similarly, we have*

$$\min\_{\alpha' \Sigma\_{11}\alpha = 1, \ \beta' \Sigma\_{22}\beta = 1} \left[ \alpha' \Sigma\_{12} \beta \right] = \alpha'\_{(p)} \Sigma\_{12} \beta\_{(p)} = \rho\_{(p)} \tag{10.1.4}$$

*where ρ(p), which is the smallest nonzero value of ρ with ρ*<sup>2</sup> *(p)* = *λ(p), the common smallest nonzero eigenvalue of A and B, represents the smallest canonical correlation between u and v or the correlation between up and vp*

This maximum correlation between the linear functions *α*- *X* and *β*- *Y* or the correlation between the best predictors *u*<sup>1</sup> and *v*<sup>1</sup> or the maximum value of *ρ* is called the *first canonical correlation* between the sets *X* and *Y* in the sense the correlation between *u*<sup>1</sup> and *v*<sup>1</sup> attains its maximum value. When *p* = 1 or *q* = 1, the canonical correlation becomes the *multiple correlation*, and when *p* = 1 and *q* = 1, it is simply the correlation between two real scalar random variables. The matrix of the nonzero eigenvalues of *A* and *B*, denoted by *Λ*, is *Λ* = diag*(λ(*1*),...,λ(p))* when *p* ≤ *q* and *Σ*<sup>12</sup> is of full rank *p*; otherwise, *p* is replaced by *q* in *Λ*.

It should be noted that, for instance, the canonical variable *β*- *Y* such that *β* satisfies the constraint *β*- *Σ*22*β* = 1 is identical to *b*- *Σ*−1*/*<sup>2</sup> <sup>22</sup> *Y* such that *b*- *b* = 1 since *β*- *Σ*22*β* = *b*- *Σ*−1*/*<sup>2</sup> <sup>22</sup> *<sup>Σ</sup>*22*Σ*−1*/*<sup>2</sup> <sup>22</sup> *b* = *b*- *b*. Accordingly, letting the *λ(i)*'s as well as *A* and *B* be as previously defined, our definition of a canonical variable, that is, *ui* = *α*- *(i)X* and *vi* = *β*- *(i)Y,* coincides with the customary one, that is, *u*∗ *<sup>i</sup>* = *a*- *i Σ*−1*/*<sup>2</sup> <sup>11</sup> *X* where *ai* is the eigenvector with respect to *B* which is associated with *λ(i)* and normalized by requiring that *a*- *i ai* = 1*,* and *v*∗ *<sup>i</sup>* = *b*- *i Σ*−1*/*<sup>2</sup> <sup>22</sup> *Y* where *bi* is an eigenvector with respect to *B* corresponding to *λ(i)* and such that *b*- *i bi* = 1*.* It can be readily proved that the canonical variables *u*<sup>∗</sup> 1*,...,u*<sup>∗</sup> *<sup>p</sup>* (or equivalently the *ui*'s) are uncorrelated, as Cov(*u*<sup>∗</sup> *<sup>i</sup> , u*<sup>∗</sup> *<sup>j</sup> )* = *a*- *iΣ*−1*/*<sup>2</sup> <sup>11</sup> *<sup>Σ</sup>*11*Σ*−1*/*<sup>2</sup> <sup>11</sup> *aj* = 0 for *i* = *j* since the normed eigenvectors *ai* are orthogonal to one another. It can be similarly established that the *v*∗ *<sup>i</sup>* 's or, equivalently, the *vi*'s are uncorrelated. Clearly, Cov(*u*<sup>∗</sup> *<sup>i</sup> , u*<sup>∗</sup> *<sup>j</sup> )* = Cov(*ui, uj )* = 1 and Cov*(v*<sup>∗</sup> *<sup>i</sup> , v*<sup>∗</sup> *<sup>j</sup> )* = Cov*(vi, vj )* = 1. We now demonstrate that, for *i* = *k,* the canonical variables, *ui* and *vk* are uncorrelated. First, consider the equation *A a* = *λ a*, that is,

$$
\Sigma\_{11}^{-\frac{1}{2}} \Sigma\_{12} \Sigma\_{22}^{-1} \Sigma\_{21} \Sigma\_{11}^{-\frac{1}{2}} a = \lambda \, a.
$$

On pre-multiplying both sides by *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> , we obtain *B b* = *λ b* where *b* = *Σ*−<sup>1</sup> 2 <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> *a*. Thus, if *λ* and *a* constitute an eigenvalue-eigenvector pair for *A*, then *λ* and *b* must also form an eigenvalue-eigenvector pair for *B*, and vice versa with *<sup>a</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> 2 <sup>22</sup> *b*. By definition, Cov(*u*<sup>∗</sup> *<sup>i</sup> , v*<sup>∗</sup> *<sup>k</sup>* )= *a*- *i Σ*−<sup>1</sup> 2 <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> 2 <sup>22</sup> *bk* where the vector *bk* <sup>=</sup> *θ Σ*−<sup>1</sup> 2 <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> *ak*, *θ* being a positive constant such that the Euclidean norm of *bk* is one. Note that since *b*- *<sup>k</sup>bk* <sup>=</sup> *<sup>θ</sup>* <sup>2</sup> *<sup>a</sup>*- *<sup>k</sup> A a*- *<sup>k</sup>* <sup>=</sup> *<sup>θ</sup>* <sup>2</sup> *<sup>a</sup>*- *<sup>k</sup> λ(k) ak* = 1*, θ* must be equal to 1*/ λ(k)*. Thus, *bk* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> *ak/ λ(k)* with *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>11</sup> *ak* = *α(k)* and *bk* = *Σ* 1 2 <sup>22</sup>*β(k)*, which is equivalent to (*iii*) with *α* = *α(k)*, *β* = *β(k)* and *ρ* = *ρ(k)*, that is, *β(k)* = *Σ*−<sup>1</sup> <sup>22</sup> *Σ*21*α(k)/ρ(k).* Then, Cov(*u*<sup>∗</sup> *<sup>i</sup> , v*<sup>∗</sup> *<sup>k</sup>* ) = *a*- *i Σ*−<sup>1</sup> 2 <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> *ak/ λ(k)* = the (*i, k*)th element of diag(*λ(*1*),...,λ(p))/ λ(k)*, which is equal to 0 whenever *i* = *k*. As expected, Cov(*u*∗ *k, v*<sup>∗</sup> *<sup>k</sup> )* = *λ(k)* = *ρ(k)*, and Cov*(ui, vk)* = *α*- *(i)Σ*12*β(k) (iii)* = *α*- *(i)Σ*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*21*α(k)/ρ(k)* = *a*- *i Σ*−<sup>1</sup> 2 <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> *ak/ λ(k)* = Cov(*u*<sup>∗</sup> *<sup>i</sup> , v*<sup>∗</sup> *<sup>k</sup> )* for *i, k* = 1*,...,p*, assuming that *p* ≤ *q* and *Σ*<sup>12</sup> is of full rank; if *p* ≥ *q* and *Σ*<sup>12</sup> is of full rank, *A* and *B* will then share *q* nonzero eigenvalues.

#### **10.1.1. An invariance property**

An interesting property of canonical correlations is now pointed out. Consider the following nonsingular transformations of *X* and *Y* : Let *X*<sup>1</sup> = *A*1*X* and *Y*<sup>1</sup> = *B*1*Y* where *A*<sup>1</sup> is a *p* × *p* nonsingular constant matrix and *B*<sup>1</sup> is a *q* × *q* constant nonsingular matrix so that |*A*1| = 0 and |*B*1| = 0. Now, consider the linear functions *α*- *X*<sup>1</sup> = *α*- *A*1*X* and *β*- *Y*<sup>1</sup> = *β*- *B*1*Y* whose variances and covariance are as follows:

$$\begin{aligned} \text{Var}(\alpha' Y\_1) &= \text{Var}(\alpha' A\_1 Y) = \alpha' A\_1 \Sigma\_{11} A\_1' \alpha, \text{Var}(\beta' Y\_1) = \text{Var}(\beta' B\_1 Y) = \beta' B\_1 \Sigma\_{22} B\_1' \beta'\\ \text{Cov}(\alpha' X\_1, \beta' Y\_1) &= \alpha' A\_1 \Sigma\_{12} B\_1' \beta = \beta' B\_1 \Sigma\_{21} A\_1' \alpha. \end{aligned}$$

On imposing the conditions Var*(α*- *X*1*)* = 1 and Var*(β*- *Y*1*)* = 1*,* and maximizing Cov*(α*- *X*1*, β*- *Y*1*)* by means of the previously used procedure, we arrive at the equations

$$A\_1 \Sigma\_{12} B\_1' \beta - \rho\_1 A\_1 \Sigma\_{11} A\_1' \alpha = 0 \tag{i\nu}$$

$$-\rho\_2 B\_1 \Sigma\_{22} B\_1' \beta + B\_1 \Sigma\_{21} A\_1' \alpha = 0. \tag{\nu}$$

On pre-multiplying *(iv)* by *α* and *(v)* by *β*- , one has *ρ*<sup>1</sup> = *ρ*<sup>2</sup> ≡ *ρ ,* say. Equations *(iv)* and *(v)* can then be re-expressed as

$$
\begin{bmatrix}
\mathcal{B}\_1 \Sigma\_{21} A\_1' & -\rho B\_1 \Sigma\_{22} B\_1'
\end{bmatrix}
\begin{bmatrix}
\alpha \\ \beta
\end{bmatrix} = 
\begin{bmatrix}
O \\ O
\end{bmatrix}
\Rightarrow
$$

$$
\begin{bmatrix}
A\_1 & O \\ O & B\_1
\end{bmatrix}
\begin{bmatrix}
\Sigma\_{21} & -\rho \Sigma\_{22}
\end{bmatrix}
\begin{bmatrix}
A\_1' & O \\ O & B\_1'
\end{bmatrix}
\begin{bmatrix}
\alpha \\ \beta
\end{bmatrix} = 
\begin{bmatrix}
O \\ O
\end{bmatrix}.
$$

Taking the determinant of the coefficient matrix and equating it to zero to determine the roots, we have

$$
\begin{bmatrix}
\begin{bmatrix} A\_1 & O \\ O & B\_1 \end{bmatrix} \begin{bmatrix} -\rho \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & -\rho \Sigma\_{22} \\ \vdots & \vdots \end{bmatrix} \begin{bmatrix} A\_1' & O \\ O & B\_1' \end{bmatrix} \end{bmatrix} = 0 \Rightarrow \tag{10.1.5}
$$

$$
\begin{vmatrix}
\Sigma\_{21} & -\rho \Sigma\_{22}
\end{vmatrix} = 0.\tag{10.1.6}
$$

As can be seen from (10.1.6), (10.1.1) and (10.1.5) have the same roots *ρ*, which means that the canonical correlation *ρ* is invariant under nonsingular linear transformations. Observe that when *Σ*<sup>12</sup> is of full rank *p* and *p* ≤ *q*, *ρ(*1*),...,ρ(p)* corresponding to the nonzero roots of (10.1.1) or (10.1.6), encompasses all the canonical correlations, so that, in that case, we have a matrix of canonical correlations. Hence, the following result:

**Theorem 10.1.2.** *Let X, a p* × 1 *vector of real scalar random variables x*1*,...,xp, and Y, a q* × 1 *vector of real scalar random variables y*1*,...,yq , have a joint distribution. Then, the canonical correlations between X and Y are invariant under nonsingular linear transformations, that is, the canonical correlations between X and Y are the same as those between A*1*X and B*1*Y where* |*A*1| = 0 *and* |*B*1| = 0*.*

#### **10.2. Pairs of Canonical Variables**

As previously explained, *λ(*1*)* which denotes the largest eigenvalue of the matrix *<sup>A</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*<sup>21</sup> or its symmetrized form *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> , as well the largest eigenvalue of *<sup>B</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*<sup>12</sup> or *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> 2 <sup>22</sup> , also turns out to be equal to *ρ*<sup>2</sup> *(*1*)* , the square of the largest root of equation (10.1.1). Having evaluated *λ(*1*)*, we compute the corresponding eigenvectors *α(*1*)* and *β(*1*)* and normalize them via the constraints *α*- *(*1*) Σ*11*α(*1*)* = 1 and *β*- *(*1*) Σ*22*β(*1*)* = 1, which produces the first pair of canonical variables: *(u*1*, v*1*)* = *(α*- *(*1*) X, β*- *(*1*) Y )*. We then take the second largest nonzero eigenvalue of *A* or *B*, denote it by *λ(*2*)*, compute the corresponding eigenvectors *α(*2*)* and *β(*2*)* and normalize them so that *α*- *(*2*) Σ*11*α(*2*)* = 1 and *β*- *(*2*) Σ*22*β(*2*)* = 1, which yields the second pair of canonical variables: *(u*2*, v*2*)* = *(α*- *(*2*) X, β*- *(*2*) Y )*. Continuing this process with all of the *p* nonzero eigenvalues if *p* ≤ *q* and *Σ*<sup>12</sup> is of full rank *p*, or with all of the *q* nonzero eigenvalues if *q* ≤ *p* and *Σ*<sup>21</sup> is of full rank *q*, will produce a complete set of canonical variables pairs: *(ui, vi)* = *(α*- *(i)X, β*- *(i)Y ), i* = 1*,...,p* or *q*.

Since the symmetrized forms of *A* and *B* are symmetric and nonnegative definite, all of their eigenvalues will be nonnegative and all nonzero eigenvalues will be positive. As is explained in Chapter 1 and Mathai and Haubold (2017a), all the eigenvalues of real symmetric matrices are real and for such matrices, there exists a full set of orthogonal eigenvectors whether some of the roots are repeated or not. Hence, *α*- *(j )X* will be uncorrelated with all the linear functions *α*- *(r)X, r* = 1*,* 2*,...,j* − 1, and *β*- *(j )Y* will be uncorrelated with *β*- *(r)Y, r* = 1*,* 2*,...,j* −1. When constructing the second pair of canonical variables, we may impose the condition that the second linear functions *α*- *X* and *β*- *Y* must be uncorrelated with the first pair *α*- *(*1*) X* and *β*- *(*1*) Y,* respectively, by taking two more Lagrangian multipliers, adding the conditions Cov*(α*- *X, α*- *(*1*) X)* = *α*- *Σ*11*α(*1*)* = 0 and *β*- *Σ*22*β(*1*)* = 0 to the optimizing function *w* and carrying out the optimization. We will then realize that these additional conditions are redundant and that the original optimizing equations are recovered, as was observed in the case of Principal Components. Similarly, we could incorporate the conditions *α*- *Σ*11*α(r)* = 0*, r* = 1*,...,j* − 1 when constructing *α(j )* and similar conditions when constructing *β(j )*. However, these uncorrelatedness conditions will become redundant in the optimization procedure. Note that *λ(*1*)* <sup>=</sup> *<sup>ρ</sup>*<sup>2</sup> *(*1*)* is the square of the first canonical correlation. Thus, the first canonical correlation is denoted by *ρ(*1*)*. Similarly *λ(r)* <sup>=</sup> *<sup>ρ</sup>*<sup>2</sup> *(r)* is the square of the *r*-th canonical correlation, *r* = 1*,...,p* when *p* ≤ *q* and *Σ*<sup>12</sup> is of full rank *p*. That is, *ρ(*1*),...,ρ(p)*, *the p nonzero roots of (10.1.1) when p* ≤ *q and Σ*<sup>12</sup> *is of full rank p*, *are canonical correlations, ρ(r) being called the r-th canonical correlation which is the r-th largest root of the determinantal equation (10.1.1)*. If *p* ≤ *q* and *Σ*<sup>12</sup> is not of full rank *p*, then there will be fewer nonzero canonical correlations.

**Example 10.2.1.** Let *Z* = *X Y* be a 5 × 1 real vector random variable where *X* is 3 × 1 and *Y* is 2 × 1. Let the covariance matrix of *Z* be *Σ* where

$$
\Sigma = \begin{bmatrix} 3 & 1 & 0 & 1 & 0 \\ 1 & 2 & 0 & 0 & 1 \\ 0 & 0 & 3 & 1 & 1 \\ 1 & 0 & 1 & 2 & 1 \\ 0 & 1 & 1 & 1 & 2 \end{bmatrix} = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \ \Sigma\_{11} = \begin{bmatrix} 3 & 1 & 0 \\ 1 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix}, \ \Sigma\_{12} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 3 \end{bmatrix}, \ \Sigma\_{22} = \begin{bmatrix} \Sigma\_{12} & \Sigma\_{12} \\ 0 & 0 & 3 \end{bmatrix}, \ \Sigma\_{33} = \begin{bmatrix} \Sigma\_{13} & \Sigma\_{14} \\ 0 & 0 & 3 \end{bmatrix}
$$

$$
\Sigma\_{22} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}, \ \Sigma\_{21} = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix}, \ \Sigma\_{12} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix} \ .
$$

Construct the pairs of canonical variables.

**Solution 10.2.1.** We need the following quantities:

$$\begin{aligned} \Sigma\_{22}^{-1} &= \frac{1}{3} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}, \ \Sigma\_{11}^{-1} = \frac{1}{15} \begin{bmatrix} 6 & -3 & 0 \\ -3 & 9 & 0 \\ 0 & 0 & 5 \end{bmatrix}, \\\Sigma\_{22}^{-1} \Sigma\_{21} &= \frac{1}{3} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix} = \frac{1}{3} \begin{bmatrix} 2 & -1 & 1 \\ -1 & 2 & 1 \end{bmatrix}, \\\Sigma\_{11}^{-1} \Sigma\_{12} &= \frac{1}{15} \begin{bmatrix} 6 & -3 & 0 \\ -3 & 9 & 0 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix} = \frac{1}{15} \begin{bmatrix} 6 & -3 \\ -3 & 9 \\ 5 & 5 \end{bmatrix}, \end{aligned}$$

$$\begin{aligned} A &= \Sigma\_{11}^{-1} \Sigma\_{12} \Sigma\_{22}^{-1} \Sigma\_{21}, \\ B &= \Sigma\_{22}^{-1} \Sigma\_{21} \Sigma\_{11}^{-1} \Sigma\_{12} = \frac{1}{45} \begin{bmatrix} 2 & -1 & 1 \\ -1 & 2 & 1 \end{bmatrix} \begin{bmatrix} 6 & -3 \\ -3 & 9 \\ 5 & 5 \end{bmatrix}, \\ &= \frac{1}{45} \begin{bmatrix} 20 & -10 \\ -7 & 26 \end{bmatrix}. \end{aligned}$$

Let us compute the eigenvalues of *B* since it is 2×2 whereas *A* is 3×3. The characteristic equation of 45*<sup>B</sup>* is *(*<sup>20</sup> <sup>−</sup> *λ)(*<sup>26</sup> <sup>−</sup> *λ)* <sup>−</sup> <sup>70</sup> <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *<sup>λ</sup>*<sup>2</sup> <sup>−</sup> <sup>46</sup>*<sup>λ</sup>* <sup>+</sup> <sup>450</sup> <sup>=</sup> 0. The roots are *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>23</sup> <sup>+</sup> <sup>√</sup>79*, λ*<sup>2</sup> <sup>=</sup> <sup>23</sup> <sup>−</sup> <sup>√</sup>79. Hence, the eigenvalues of *<sup>B</sup>* are *ρj* <sup>=</sup> *λj* <sup>45</sup> , that

#### Canonical Correlation Analysis

is, *<sup>ρ</sup>*<sup>1</sup> <sup>=</sup> <sup>23</sup><sup>+</sup> <sup>√</sup><sup>79</sup> <sup>45</sup> *, ρ*<sup>2</sup> <sup>=</sup> <sup>23</sup><sup>−</sup> <sup>√</sup><sup>79</sup> <sup>45</sup> . We have denoted the second set of real scalar random variables by *Y, Y* - = [*y*1*, y*2]. An eigenvector corresponding to *ρ*<sup>1</sup> is available from *(B* − *ρ*1*I )Y* = *O*. Since the right-hand side is null, we may omit the denominator. The first equation is then *(*−3<sup>−</sup> <sup>√</sup>79*)y*<sup>1</sup> <sup>−</sup>10*y*<sup>2</sup> <sup>=</sup> 0. Taking *<sup>y</sup>*<sup>1</sup> <sup>=</sup> 1, *<sup>y</sup>*<sup>2</sup> = − <sup>1</sup> <sup>10</sup> *(*3<sup>+</sup> <sup>√</sup>79*)*. It is easily verified that these values will also satisfy the second equation in *(B* − *ρ*1*I )Y* = *O*. An eigenvector, denoted by *β*1, is the following:

$$
\beta\_1 = \begin{bmatrix} 1 \\ -\frac{1}{10}(3+\sqrt{79}) \end{bmatrix}.
$$

We normalize *β*<sup>1</sup> through *β*- <sup>1</sup>*Σ*22*β*<sup>1</sup> = 1. To this end, consider

$$\begin{aligned} \beta\_1' \Sigma\_{22} \beta\_1 &= [1, -\frac{1}{10}(3 + \sqrt{79})] \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ -\frac{1}{10}(3 + \sqrt{79}) \end{bmatrix} \\ &= \frac{1}{25}(79 - 2\sqrt{79}). \end{aligned}$$

Hence a normalized *β*1, denoted by *β(*1*)*, and the corresponding canonical variable *v*<sup>1</sup> are the following:

$$\beta\_{(\rm I)} = \frac{5}{\sqrt{79 - 2\sqrt{79}}} \begin{bmatrix} 1 \\ -\frac{1}{10}(3 + \sqrt{79}) \end{bmatrix}, \ v\_{\rm I} = \frac{5}{\sqrt{79 - 2\sqrt{79}}} [\text{y}\_{\rm I} - \frac{1}{10}(3 + \sqrt{79})\text{y}\_{\rm I}].$$

The second eigenvalue of *<sup>B</sup>* is *<sup>ρ</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>45</sup> *(*<sup>23</sup> <sup>−</sup> <sup>√</sup>79*)*. An eigenvector corresponding to *<sup>ρ</sup>*<sup>2</sup> is available from the equation √ *(B* − *ρ*2*I )Y* = *O*. The second equation gives −7*y*<sup>1</sup> + *(*3 + <sup>79</sup>*)y*<sup>2</sup> <sup>=</sup> 0. Taking *<sup>y</sup>*<sup>2</sup> <sup>=</sup> 1, *<sup>y</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>7</sup> *(*<sup>3</sup> <sup>+</sup> <sup>√</sup>79*)*. Hence, an eigenvector corresponding to *ρ*2, denoted by *β*2, is the following:

$$
\beta\_2 = \begin{bmatrix}
\frac{1}{7}(3+\sqrt{79}) \\
1
\end{bmatrix}.
$$

We normalize this vector through the constraint *β*- <sup>2</sup>*Σ*22*β*<sup>2</sup> = 1. Consider

$$\begin{split} \beta\_2' \Sigma\_{22} \beta\_2 &= \left[ \frac{1}{7} (3 + \sqrt{79}, \, 1) \right] \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} \frac{1}{7} (3\sqrt{79}) \\ 1 \end{bmatrix} \\ &= \frac{1}{49} (316 + 26\sqrt{79}). \end{split}$$

Hence, the normalized eigenvector, denoted by *β(*2*)*, and the corresponding canonical variable *v*<sup>2</sup> are

$$\beta\_{(2)} = \frac{7}{\sqrt{316 + 26\sqrt{79}}} \begin{bmatrix} \frac{1}{7}(3 + \sqrt{79}) \\ 1 \end{bmatrix}, \ v\_2 = \frac{7}{\sqrt{316 + 26\sqrt{79}}} \mathbf{i} \frac{1}{7}(3 + \sqrt{79}) \mathbf{y}\_1 + \mathbf{y}\_2 \mathbf{j}.$$

We will obtain the eigenvectors resulting from *(A* − *ρ*1*I )X* = *O* from the eigenvector *β*<sup>1</sup> instead of solving the equation relating to *<sup>A</sup>*, as the presence of the term 3+√79 can make the computations tedious. From equation *(ii)* of Sect. 10.1, we have

$$\begin{split} \alpha\_{1} &= \frac{1}{\rho\_{1}} \Sigma\_{11}^{-1} \Sigma\_{12} \beta\_{1} \\ &= \frac{\sqrt{45}}{\sqrt{23 + \sqrt{79}}} \Big( \frac{1}{15} \bigg) \begin{bmatrix} 6 & -3 \\ -3 & 9 \\ 5 & 5 \end{bmatrix} \bigg[ -\frac{1}{10} (3 + \sqrt{79}) \bigg] \\ &= \frac{\sqrt{45}}{15(\sqrt{23 + \sqrt{79}})} \begin{bmatrix} 6 + \frac{3}{10} (3 + \sqrt{79}) \\ -3 - \frac{9}{10} (3 + \sqrt{79}) \\ 5 - \frac{5}{10} (3 + \sqrt{79}) \end{bmatrix} .\end{split}$$

Let us normalize this vector by requiring that *α*- <sup>1</sup>*Σ*11*α*<sup>1</sup> = 1 or *α*- <sup>1</sup>*Σ*11*α*<sup>1</sup> <sup>=</sup> <sup>1</sup> *ρ*2 1 *β*- 1*Σ*21*Σ*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*β*<sup>1</sup> <sup>=</sup> *<sup>γ</sup>* <sup>2</sup> <sup>1</sup> , say:

$$\begin{split} \nu\_{1}^{2} &= \frac{45}{15(23+\sqrt{79})} \Biggl[ 1, -\frac{1}{10}(3+\sqrt{79}) \Biggr] \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix} \\ &\times \begin{bmatrix} 6 & -3 \\ -3 & 9 \\ 5 & 5 \end{bmatrix} \Biggl[ \begin{array}{c} 1 \\ -\frac{1}{10}(3+\sqrt{79}) \end{array} \Biggr] \\ &= \frac{45}{15(23+\sqrt{79})} \Bigl[ 1, -\frac{1}{10}(3+\sqrt{79}) \Bigr] \begin{bmatrix} 11 & 2 \\ 2 & 14 \end{bmatrix} \Biggl[ \begin{array}{c} 1 \\ -\frac{1}{10}(3+\sqrt{79}) \end{array} \Biggr] \\ &= \frac{45}{15(23+\sqrt{79})(25)} [553+11\sqrt{79}]. \end{split}$$

Hence, the normalized *α*1, denoted by *α(*1*)*, is the following:

$$\alpha\_{(1)} = \frac{\alpha\_1}{|\gamma\_1|} = \frac{5}{\sqrt{(15)}\sqrt{(553 + 11\sqrt{79})}} \begin{bmatrix} 6 + \frac{3}{10}(3 + \sqrt{79}) \\ -3 - \frac{9}{10}(3 + \sqrt{79}) \\ 5 - \frac{5}{10}(3 + \sqrt{79}) \end{bmatrix}.$$

so that the corresponding canonical variable is

$$\mu\_{1} = \frac{5}{\sqrt{(15)}\sqrt{(553+11\sqrt{79})}} \left\{ \left[ 6 + \frac{3}{10} (3 + \sqrt{79}) \right] \mathbf{x}\_{1} - \left[ 3 + \frac{9}{10} (3 + \sqrt{79}) \right] \mathbf{x}\_{2} \right\}$$

$$+ \left[ 5 - \frac{1}{2} (3 + \sqrt{79}) \right] \mathbf{x}\_{3} \Big|\_{\mathbf{x}\_{1}}.$$

Now, from the formula *<sup>α</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> *ρ*2 *Σ*−<sup>1</sup> <sup>11</sup> *Σ*12*β*2, we have

$$\alpha\_2 = \frac{\sqrt{45}}{\sqrt{(23-\sqrt{79})}} \left(\frac{1}{15}\right) \begin{bmatrix} 6 & -3\\ -3 & 9\\ 5 & 5 \end{bmatrix} \left[\begin{matrix} \frac{1}{7}(3+\sqrt{79})\\ 1 \end{matrix}\right]$$

$$=\frac{\sqrt{45}}{15(\sqrt{23-\sqrt{79}})} \begin{bmatrix} \frac{6}{7}(3+\sqrt{79})-3\\ -\frac{3}{7}(3+\sqrt{79})+9\\ \frac{5}{7}(3+\sqrt{79})+5 \end{bmatrix}.$$

Let us normalize this vector via the constraint *α*- <sup>2</sup>*Σ*11*α*<sup>2</sup> = 1 or

$$
\alpha\_2' \Sigma\_{11} \alpha\_2 = \frac{1}{\rho\_2^2} \beta\_2' \Sigma\_{21} \Sigma\_{11}^{-1} \Sigma\_{12} \beta\_2 = \wp\_2^2,
$$

say. Thus,

$$\begin{split} \nu\_2^2 &= \frac{45}{15(23 - \sqrt{79})} \Big[ \frac{1}{7} (3 + \sqrt{79}), 1 \Big] \begin{bmatrix} 11 & 2 \\ 2 & 14 \end{bmatrix} \Big[ \frac{1}{7} (3 + \sqrt{79}) \Big] \\ &= \frac{45}{15(23 - \sqrt{79})} \Big[ \frac{1}{49} (1738 + 94\sqrt{79}) \Big], \end{split}$$

and the normalized vector *α*2, denoted by *α(*2*)*, is

$$\alpha\_{(2)} = \frac{\alpha\_2}{|\gamma\_2|} = \frac{7}{\sqrt{15}\sqrt{(1738 + 94\sqrt{79})}} \begin{bmatrix} \frac{6}{7}(3 + \sqrt{79}) - 3\\ -\frac{3}{7}(3 + \sqrt{79}) + 9\\ \frac{5}{7}(3 + \sqrt{79}) + 5 \end{bmatrix},$$

so that the second canonical variable is

$$\mu\_2 = \frac{7}{\sqrt{15}\sqrt{(1738 + 94\sqrt{79})}} \left\{ \left[\frac{6}{7}(3 + \sqrt{79}) - 3\right] \mathbf{x}\_1 - \left[\frac{3}{7}(3 + \sqrt{79}) + 9\right] \mathbf{x}\_2 \right\}$$

$$+ \left[\frac{5}{7}(3 + \sqrt{79}) + 5\right] \mathbf{x}\_3$$

Hence, the canonical pairs are *(u*1*, v*1*), (u*2*, v*2*)* where *uj* is the best predictor of *vj* and vice versa for *j* = 1*,* 2. The pair of canonical variables *(u*2*, v*2*)* has the second largest canonical correlation. It is easy to verify that Cov*(u*1*, u*2*)* = 0 and Cov*(v*1*, v*2*)* = 0.

#### **10.3. Estimation of the Canonical Correlations and Canonical Variables**

Consider a simple random sample of size *n* from a population designated by the *(p* + *q)* <sup>×</sup> 1 real vector *X Y* . Let the (corrected) sample sum of products matrix be denoted by

$$\mathbf{S} = \begin{bmatrix} \mathbf{S}\_{11} & \mathbf{S}\_{12} \\ \mathbf{S}\_{21} & \mathbf{S}\_{22} \end{bmatrix}, \text{  $\mathbf{S}\_{11}$  is  $p \times p$ ,  $\mathbf{S}\_{22}$  is  $q \times q$ ,  $t$ }$$

where *S*<sup>11</sup> is the sample sum of products matrix corresponding to the sample from the subvector *X*, whose *(i, j )*th element is of the form *<sup>n</sup> <sup>k</sup>*=<sup>1</sup>*(xik* − ¯*xi)(xjk* − ¯*xj )* with the matrix *(xik)* denoting a sample of size *n* from *X*, *S*<sup>22</sup> is the sample sum of products matrix corresponding to the subvector *Y* and <sup>1</sup> *<sup>n</sup>S*<sup>12</sup> is the sample covariance between *X* and *Y* . Thus, denoting the estimates by hats, the estimates of *<sup>Σ</sup>*11, *<sup>Σ</sup>*<sup>22</sup> and *<sup>Σ</sup>*<sup>12</sup> are *<sup>Σ</sup>*<sup>ˆ</sup> <sup>11</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S*11, *<sup>Σ</sup>*<sup>ˆ</sup> <sup>22</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S*<sup>22</sup> and *<sup>Σ</sup>*<sup>ˆ</sup> <sup>12</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S*12, respectively. These will also be the maximum likelihood estimates if we assume normality, that is, if

$$Z = \begin{pmatrix} X \\ Y \end{pmatrix} \sim N\_{p+q}(\nu, \ \Sigma), \ \Sigma > O, \ \Sigma = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \tag{10.3.1}$$

where *Σ*<sup>11</sup> = Cov*(X) > O, Σ*<sup>22</sup> = Cov*(Y ) > O* and *Σ*<sup>12</sup> = Cov*(X, Y )*. For the estimates of these submatrices, equation (10.1.1) will take the following form:

$$
\begin{vmatrix}
\hat{\Sigma}\_{21} & -t\hat{\Sigma}\_{22}
\end{vmatrix} = 0 \Rightarrow \begin{vmatrix}
S\_{21} & -tS\_{22}
\end{vmatrix} = 0\tag{10.3.2}
$$

where *t* is the sample canonical correlation; the reader may also refer to Mathai and Haubold (2017b). Letting *ρ*ˆ = *t* be the estimated canonical correlation, whenever *p* ≤ *q*, *t*<sup>2</sup> is an eigenvalue of the sample canonical correlation matrix given by

$$
\hat{\Sigma}\_{11}^{-\frac{1}{2}} \hat{\Sigma}\_{12} \hat{\Sigma}\_{22}^{-1} \hat{\Sigma}\_{21} \hat{\Sigma}\_{11}^{-\frac{1}{2}} = S\_{11}^{-\frac{1}{2}} S\_{12} S\_{22}^{-1} S\_{21} S\_{11}^{-\frac{1}{2}} = R\_{11}^{-\frac{1}{2}} R\_{12} R\_{22}^{-1} R\_{21} R\_{11}^{-\frac{1}{2}}.\tag{10.3.3}
$$

Note that we have chosen the symmetric format for the sample canonical correlation matrix. Observe that the sample size *n* is omitted from the middle expression in (10.3.3) as it gets canceled. As well, the middle expression is expressed in terms of sample correlation matrices in the last expression appearing in (10.3.3). The conversion from a sample covariance matrix to a sample correlation matrix has previously been explained. Letting *S* = *(sij )* denote the sample sum of products matrix, we can write

$$S = S\_{\mathbf{l}} R S\_{\mathbf{l}}, \ S\_{\mathbf{l}} = \text{diag}(\sqrt{s\_{11}}, \dots, \sqrt{s\_{p+q, p+q}}), \ R = (r\_{ij}) = \begin{bmatrix} R\_{11} & R\_{12} \\ R\_{21} & R\_{22} \end{bmatrix},$$

where *rij* is the *(i, j )*th sample correlation coefficient, and for example, *R*<sup>11</sup> is the *p* × *p* submatrix within the *(p* <sup>+</sup> *q)* <sup>×</sup> *(p* <sup>+</sup> *q)* matrix *<sup>R</sup>*. We will examine the distribution of *<sup>t</sup>*<sup>2</sup> when the population covariance submatrix *Σ*<sup>12</sup> = *O*, as well as when *Σ*<sup>12</sup> = *O*, in the case of a *(p* + *q)*-variate real Gaussian population as given in (10.3.1), after considering an example to illustrate the computations of the canonical correlation *ρ* resulting from (10.1.1) and presenting an iterative procedure.

**Example 10.3.1.** Let *X* and *Y* be two real bivariate vector random variables and *Z* = *X Y* . Consider the following simple random sample of size 5 from *Z*:

$$Z\_1 = \begin{bmatrix} 1 \\ 2 \\ 1 \\ -1 \end{bmatrix}, \ Z\_2 = \begin{bmatrix} 2 \\ 0 \\ -1 \\ 1 \end{bmatrix}, \ Z\_3 = \begin{bmatrix} 2 \\ 1 \\ 0 \\ -1 \end{bmatrix}, \ Z\_4 = \begin{bmatrix} 0 \\ 1 \\ 1 \\ 0 \end{bmatrix}, \ Z\_5 = \begin{bmatrix} 0 \\ 1 \\ -1 \\ 1 \end{bmatrix}.$$

Construct the sample pairs of canonical variables.

**Solution 10.3.1.** Let us use the standard notation. The sample matrix is **Z** = [*Z*1*,...,Z*5], the sample average *<sup>Z</sup>*¯ <sup>=</sup> <sup>1</sup> <sup>5</sup> [*Z*<sup>1</sup> +···+ *Z*5], the matrix of sample averages is **Z**¯ = [*Z,... ,* ¯ *Z*¯], the deviation matrix is **Z***<sup>d</sup>* = [*Z*<sup>1</sup> − *Z,... ,Z* ¯ <sup>5</sup> − *Z*¯] and the sample sum of products matrix is *S* = **Z***<sup>d</sup>* **Z**- *<sup>d</sup>* . These quantities are the following:

$$\begin{aligned} \mathbf{Z} &= \begin{bmatrix} 1 & 2 & 2 & 0 & 0 \\ 2 & 0 & 1 & 1 & 1 \\ 1 & -1 & 0 & 1 & -1 \\ -1 & 1 & -1 & 0 & 1 \end{bmatrix}, \quad \bar{Z} = \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix}, \\ \mathbf{Z}\_d &= \begin{bmatrix} 0 & 1 & 1 & -1 & -1 \\ 1 & -1 & 0 & 0 & 0 \\ 1 & -1 & 0 & 1 & -1 \\ -1 & 1 & -1 & 0 & 1 \end{bmatrix}, \quad S = \begin{bmatrix} 4 & -1 & -1 & -1 \\ -1 & 2 & 2 & -2 \\ -1 & 2 & 4 & -3 \\ -1 & -2 & -3 & 4 \end{bmatrix}, \end{aligned}$$

where, as per our notation,

$$\begin{aligned} S\_{11} &= \begin{bmatrix} 4 & -1 \\ -1 & 2 \end{bmatrix}, & S\_{12} &= \begin{bmatrix} -1 & -1 \\ 2 & -2 \end{bmatrix}, \\ S\_{21} &= \begin{bmatrix} -1 & 2 \\ -1 & -2 \end{bmatrix}, & S\_{22} &= \begin{bmatrix} 4 & -3 \\ -3 & 4 \end{bmatrix}. \end{aligned}$$

We need to compute the following items:

$$\begin{aligned} S\_{11}^{-1} &= \frac{1}{7} \begin{bmatrix} 2 & 1 \\ 1 & 4 \end{bmatrix}, \ S\_{22}^{-1} = \frac{1}{7} \begin{bmatrix} 4 & 3 \\ 3 & 4 \end{bmatrix}, \\\ S\_{11}^{-1} S\_{12} &= \frac{1}{7} \begin{bmatrix} 2 & 1 \\ 1 & 4 \end{bmatrix} \begin{bmatrix} -1 & -1 \\ 2 & -2 \end{bmatrix} = \frac{1}{7} \begin{bmatrix} 0 & -4 \\ 7 & -9 \end{bmatrix}, \\\ S\_{22}^{-1} S\_{21} &= \frac{1}{7} \begin{bmatrix} 4 & 3 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} -1 & 2 \\ -1 & -2 \end{bmatrix} = \frac{1}{7} \begin{bmatrix} -7 & 2 \\ -7 & -2 \end{bmatrix}. \end{aligned}$$

The matrices *A* and *B* are then the following (using the same notation as for the population values for convenience):

$$\begin{aligned} A &= S\_{11}^{-1} S\_{12} S\_{22}^{-1} S\_{21} = \frac{1}{7^2} \begin{bmatrix} 0 & -4 \\ 7 & -9 \end{bmatrix} \begin{bmatrix} -7 & 2 \\ -7 & -2 \end{bmatrix} = \frac{2}{7^2} \begin{bmatrix} 14 & 4 \\ 7 & 16 \end{bmatrix}, \\\ B &= S\_{22}^{-1} S\_{21} S\_{11}^{-1} S\_{12} = \frac{1}{7^2} \begin{bmatrix} -7 & 2 \\ -7 & -2 \end{bmatrix} \begin{bmatrix} 0 & -4 \\ 7 & -9 \end{bmatrix} = \frac{2}{7^2} \begin{bmatrix} 7 & 8 \\ -7 & 23 \end{bmatrix}. \end{aligned}$$

If the population covariance matrix of *Z* is *Σ*, an estimate of *Σ* is *<sup>S</sup> <sup>n</sup>* where *S* is the sample sum of products matrix and *n* is the sample size, which is also the maximum likelihood estimate of *Σ* if *Z* is Gaussian distributed. Instead of using *<sup>S</sup> <sup>n</sup>* , we will work with *S* since the normalized eigenvectors of *S* and *<sup>S</sup> <sup>n</sup>* are identical, although the eigenvalues of *<sup>S</sup> <sup>n</sup>* are <sup>1</sup> *n* times the eigenvalues of *S*.

The eigenvalues of *A* are <sup>2</sup> <sup>72</sup> times the solution of *(*14 − *λ)(*16 − *λ)* − 28 = 0 ⇒ *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>15</sup> <sup>+</sup> <sup>√</sup>29*, λ*<sup>2</sup> <sup>=</sup> <sup>15</sup> <sup>−</sup> <sup>√</sup>29, so that the eigenvalues of *<sup>A</sup>*, denoted by *<sup>λ</sup>*<sup>11</sup> and *<sup>λ</sup>*12, are *<sup>λ</sup>*<sup>11</sup> <sup>=</sup> *(* <sup>2</sup> <sup>72</sup> *)(*<sup>15</sup> <sup>+</sup> <sup>√</sup>29*), λ*<sup>12</sup> <sup>=</sup> <sup>2</sup> <sup>72</sup> *(*<sup>15</sup> <sup>−</sup> <sup>√</sup>29*)*. The eigenvalues of *<sup>B</sup>* are <sup>2</sup> <sup>72</sup> times the solutions of *(*<sup>7</sup> <sup>−</sup> *ν)(*<sup>23</sup> <sup>−</sup> *ν)* <sup>+</sup> <sup>35</sup> <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *<sup>ν</sup>*<sup>1</sup> <sup>=</sup> <sup>15</sup> <sup>+</sup> <sup>√</sup>29*, ν*<sup>2</sup> <sup>=</sup> <sup>15</sup> <sup>−</sup> <sup>√</sup>29, so that the eigenvalues of *<sup>B</sup>*, denoted by *<sup>ν</sup>*<sup>21</sup> and *<sup>ν</sup>*22, are *<sup>ν</sup>*<sup>21</sup> <sup>=</sup> <sup>2</sup> <sup>72</sup> *(*<sup>15</sup> <sup>+</sup> <sup>√</sup>29*), ν*<sup>22</sup> <sup>=</sup> <sup>2</sup> <sup>72</sup> *(*<sup>15</sup> <sup>−</sup> <sup>√</sup>29*)*, which, as expected, are the same as those of *A*. Corresponding to *λ*11*,* an eigenvector from *A* is available from the equation

$$
\begin{bmatrix} 14 - (15 + \sqrt{29}) & 4 \\ 7 & 16 - (15 + \sqrt{29}) \end{bmatrix} \begin{bmatrix} x\_1 \\ x\_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix},
$$

deleting <sup>2</sup> <sup>72</sup> from both sides. Thus, one solution is

$$\alpha\_1 = \begin{bmatrix} 4/[1+\sqrt{29}] \\ 1 \end{bmatrix}.$$

Let us normalize this vector through the constraint *α*- <sup>1</sup>*S*11*α*<sup>1</sup> = 1. Since

$$\alpha\_1' S\_{11} \alpha\_1 = [\frac{4}{1+\sqrt{29}}, 1] \begin{bmatrix} 4 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} \frac{4}{1+\sqrt{29}} \\ 1 \end{bmatrix} = \frac{2}{7^2} (116 - 11\sqrt{29}),$$

the normalized eigenvector, denoted by *α(*1*)*, and the corresponding sample canonical variable, denoted by *u*1, are

$$\alpha\_{(1)} = \frac{7}{\sqrt{2}\sqrt{116 - 11\sqrt{29}}} \begin{bmatrix} \frac{4}{1 + \sqrt{29}}\\ 1 \end{bmatrix} \text{ and } u\_1 = \frac{7}{\sqrt{2}\sqrt{116 - 11\sqrt{29}}} \begin{bmatrix} 4\\ 1 + \sqrt{29} \end{bmatrix} \text{J}.$$

The eigenvalues of *B* are also the same as *ν*<sup>1</sup> = 15+ <sup>√</sup>29*, ν*<sup>2</sup> <sup>=</sup> <sup>15</sup><sup>−</sup> <sup>√</sup>29. Let us compute an eigenvector corresponding to the eigenvalue *ν*<sup>1</sup> obtained from *B*. This eigenvector can be determined from the equation

$$
\begin{bmatrix} -8 - \sqrt{29} & 5 \\ -7 & 8 - \sqrt{29} \end{bmatrix} \begin{bmatrix} y\_1 \\ y\_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}
$$

which gives one solution as

$$
\beta\_1 = \begin{bmatrix} \ \mathfrak{S}/(8+\sqrt{29}) \\ 1 \end{bmatrix}.
$$

Let us normalize under the constraint *β*- <sup>1</sup>*S*22*β*<sup>1</sup> = 1. Since

$$
\beta\_1' S\_{22} \beta\_1 = [\frac{5}{8 + \sqrt{29}}, 1] \begin{bmatrix} 4 & -3 \\ -3 & 4 \end{bmatrix} \begin{bmatrix} \frac{5}{8 + \sqrt{29}} \\ 1 \end{bmatrix} = \frac{2}{7^2} (116 - 11\sqrt{29}),
$$

the normalized eigenvector, denoted by *β(*1*)*, and the corresponding canonical variable denoted by *v*1, are the following:

$$\beta\_{(\mathbf{l})} = \frac{7}{\sqrt{2}\sqrt{116 - 11\sqrt{29}}} \left[ \begin{array}{c} \frac{\mathbf{s}}{8 + \sqrt{29}}\\ 1 \end{array} \right] \text{ and } \mathbf{v}\_{\mathbf{l}} = \frac{7}{\sqrt{2}\sqrt{116 - 11\sqrt{29}}} \left[ \frac{\mathbf{s}}{8 + \sqrt{29}} \mathbf{y}\_{\mathbf{l}} + \mathbf{y}\_{\mathbf{2}} \right].$$

Therefore, one pair of canonical variables is *(u*1*, v*1*)* where *u*<sup>1</sup> is the best predictor of *v*<sup>1</sup> and vice versa. Now, consider *<sup>λ</sup>*<sup>2</sup> <sup>=</sup> <sup>15</sup> <sup>−</sup> <sup>√</sup>29 and *<sup>ν</sup>*<sup>2</sup> <sup>=</sup> <sup>15</sup> <sup>−</sup> <sup>√</sup>29. Proceed as in the above case to obtain the second pair of canonical variables *(u*2*, v*2*)*.

#### **10.3.1. An iterative procedure**

Without any loss of generality, let *p* ≤ *q*. We will illustrate the procedure for the population values for convenience. Consider the matrix *A* as previously defined, that is, *<sup>A</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*21, and *ρ*, the canonical correlation which is a solution of (10.1.1) with *<sup>ρ</sup>*<sup>2</sup> <sup>=</sup> *<sup>λ</sup>* where *<sup>λ</sup>* is an eigenvalue of *<sup>A</sup>*. When *<sup>p</sup>* is small, we may directly solve the determinantal equation (10.1.1) and evaluate the roots which are the canonical correlations. We are now illustrating the computations for the population values. When *p* is large, direct evaluation could prove tedious without resorting to computational software packages. In this case, the following iterative procedure may be employed. Let *λ* be an eigenvalue of *<sup>A</sup>* and *<sup>α</sup>* the corresponding eigenvector. We want to evaluate *<sup>λ</sup>* <sup>=</sup> *<sup>ρ</sup>*<sup>2</sup> and *α*, but we cannot solve (10.1.1) directly when *p* is large. In that case, take an initial trial vector *γ*<sup>0</sup> and normalize it via the constraint *γ* - <sup>0</sup> *Σ*11*γ*<sup>0</sup> = 1 so that *α*- <sup>0</sup> *Σ*11*α*<sup>0</sup> = 1, *α*<sup>0</sup> being the normalized *γ*0. Then, *α*<sup>0</sup> = √ 1 *γ* - <sup>0</sup>*Σ*11*γ*<sup>0</sup> *γ*0. Now, consider the equation

$$A\,\alpha\_0 = \gamma\_1.$$

If *α*<sup>0</sup> happens to be the eigenvector *α(*1*)* corresponding to the largest eigenvalue *λ(*1*)* of *A* then *A α*<sup>0</sup> = *λ(*1*)α(*1*)*; *A α*<sup>0</sup> = *γ*<sup>1</sup> ⇒ *γ* - <sup>1</sup>*Σ*11*γ*<sup>1</sup> <sup>=</sup> *<sup>λ</sup>*<sup>2</sup> *(*1*) α*- *(*1*) <sup>Σ</sup>*11*α(*1*)* <sup>=</sup> *<sup>λ</sup>*<sup>2</sup> *(*1*)* since *α*- *(*1*) <sup>Σ</sup>*11*α(*1*)* <sup>=</sup> 1. Then *<sup>ρ</sup>*<sup>2</sup> *(*1*)* = *λ(*1*)* = *γ* - 1*Σ*11*γ*1. This gives the motivation for the iterative procedure. Consider the equation

$$A\,\alpha\_{i} = \gamma\_{i+1},\,\,\alpha\_{i} = \frac{1}{\sqrt{\gamma\_{i}^{\prime}\Sigma\_{11}\gamma\_{i}}}\gamma\_{i},\,\,i = 0,1,\,\ldots \tag{i}$$

Continue the iteration process. At each stage compute *δi* = *α*- *<sup>i</sup>Σ*11*αi* while ensuring that *δi* is increasing. Halt the iteration when *γj* = *γj*−<sup>1</sup> approximately, that is, when *αj* = *αj*−<sup>1</sup> approximately, which indicates that *γj* converges to some vector *γ* . At this stage, the normalized *γ* is *α(*1*),* the eigenvector corresponding to the largest eigenvalue *λ(*1*)* of *A*. Then, the largest eigenvalue *λ(*1*)* of *<sup>A</sup>* is given by *λ(*1*)* <sup>=</sup> <sup>√</sup>*<sup>γ</sup>* - *Σ*11*γ* . Thus, as a result of the iteration process specified by equation *(i),*

$$\lim\_{j \to \infty} \alpha\_j = \alpha\_{\text{(l)}} \text{ and } + \sqrt{\lim\_{j \to \infty} \gamma\_j' \Sigma\_{\text{11}} \gamma\_j} = \lambda\_{\text{(l)}}.\tag{ii}$$

These initial iterations produce the largest eigenvalue *λ(*1*)* <sup>=</sup> *<sup>ρ</sup>*<sup>2</sup> *(*1*)* and the corresponding eigenvector *α(*1*)*. From (10.1.1), we have

$$
\Delta\Sigma\_{22}^{-1}\Sigma\_{21}\alpha = \rho\,\beta \Rightarrow \frac{1}{\rho}\Sigma\_{22}^{-1}\Sigma\_{21}\alpha = \beta. \tag{iiii}
$$

Substitute the computed *ρ(*1*)* and *α(*1*)* in *(iii)* to obtain *β(*1*),* the eigenvector corresponding to the largest eigenvalue *λ(*1*)* of *<sup>B</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> 2 <sup>22</sup> . This completes the first stage of the iteration process. Now, consider *A*<sup>2</sup> = *A* − *λ(*1*)α(*1*)α*- *(*1*)* . Observe that *α(*1*)α*- *(*1*)* is a *p* × *p* matrix. In general, we can express a symmetric matrix in terms of its eigenvalues and normalized eigenvectors as follows:

$$A = \Sigma\_{11}^{-\frac{1}{2}} \Sigma\_{12} \Sigma\_{22}^{-1} \Sigma\_{21} \Sigma\_{11}^{-\frac{1}{2}} = \lambda\_{(1)} a\_{(1)} a\_{(1)}' + \lambda\_{(2)} a\_{(2)} a\_{(2)}' + \dots + \lambda\_{(p)} a\_{(p)} a\_{(p)}', \quad (\text{iv})$$

as explained in Chapter 1 or Mathai and Haubold (2017a). Carry out the second stage of the iteration process on *A*<sup>2</sup> as indicated in *(i)*. This will produce the second largest eigenvalue *λ(*2*)* of *A* and the corresponding eigenvector *α(*2*)*. Then, compute the corresponding *β(*2*)* via the procedure given in *(iii)*. This will complete the second stage. For the next stage, consider

$$A\_3 = A\_2 - \lambda\_{(2)}\alpha\_{(2)}\alpha\_{(2)}' = A - \lambda\_{(1)}\alpha\_{(1)}\alpha\_{(1)}' - \lambda\_{(2)}\alpha\_{(2)}\alpha\_{(2)}'$$

and perform the iterative steps *(i)* to *(iv)*. This will produce *λ(*3*), α(*3*)* and *β(*3*)*. Keep on iterating until all the *p* eigenvalues *λ(*1*),...,λ(p)* of *A* as well as *α(j )* and *β(j )*, the corresponding eigenvectors of *A* and *B* are obtained for *j* = 1*,...,p*.

In the case of sample eigenvalues and eigenvectors, start with the sample matrices

$$\begin{split} \hat{A} &= \hat{\Sigma}\_{11}^{-\frac{1}{2}} \hat{\Sigma}\_{12} \hat{\Sigma}\_{22}^{-1} \hat{\Sigma}\_{21} \hat{\Sigma}\_{11}^{-\frac{1}{2}} = \hat{R}\_{11}^{-\frac{1}{2}} R\_{12} R\_{22}^{-1} R\_{21} R\_{11}^{-\frac{1}{2}} \\ \hat{B} &= \hat{\Sigma}\_{22}^{-\frac{1}{2}} \hat{\Sigma}\_{21} \hat{\Sigma}\_{11}^{-1} \hat{\Sigma}\_{12} \hat{\Sigma}\_{22}^{-\frac{1}{2}} = \hat{R}\_{22}^{-\frac{1}{2}} R\_{21} R\_{11}^{1} R\_{12} R\_{22}^{-\frac{1}{2}} . \end{split}$$

Carry out the iteration steps *(i)* to *(iv)* on *A*ˆ to obtain the sample eigenvalues, denoted by *ρ*ˆ*(j )* = *t(j ), j* = 1*,...,p* for *p* ≤ *q*, where *t(j )* is the *j* -th sample canonical correlation, and the corresponding eigenvectors of *A*ˆ denoted by *a(j )* as well as those of *B*ˆ denoted by *b(j )*.

**Example 10.3.2.** Consider the real vectors *X*- = *(x*1*, x*2*), Y* - = *(y*1*, y*2*, y*3*)*, and let *Z*- = *(X*- *, Y* - *)* where *x*1*, x*2*, y*1*, y*2*, y*<sup>3</sup> are real scalar random variables. Let the covariance matrix of *Z* be *Σ>O* where

$$\begin{aligned} \text{Cov}(Z) &= \text{Cov}\begin{bmatrix} X \\ Y \end{bmatrix} = \Sigma = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \ \Sigma\_{12} = \Sigma\_{21}',\\ \text{Cov}(X) &= \Sigma\_{11}, \ \text{Cov}(Y) = \Sigma\_{22}, \ \text{Cov}(X, Y) = \Sigma\_{12}, \end{aligned}$$

with

$$
\Sigma\_{11} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}, \ \Sigma\_{12} = \begin{bmatrix} 1 & 1 & 1 \\ 1 & -1 & 1 \end{bmatrix}, \ \Sigma\_{22} = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 2 & 0 \\ 0 & 0 & 1 \end{bmatrix}.
$$

Consider the problem of predicting *X* from *Y* and vice versa. Obtain the best predictors by constructing pairs of canonical variables.

**Solution 10.3.2.** Let us first compute the inverses *Σ*−<sup>1</sup> <sup>11</sup> *, Σ*−<sup>1</sup> <sup>22</sup> and *<sup>Σ</sup>*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*, Σ*−<sup>1</sup> <sup>22</sup> *Σ*21. We are taking the non-symmetric form of *A* as the symmetric form requires more calculations. Either way, the eigenvalues are identical. On directly applying the formula *<sup>C</sup>*−<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>|</sup>*C*<sup>|</sup> × [the matrix of cofactors] - , we have

$$\begin{aligned} \Sigma\_{11}^{-1} &= \frac{1}{3} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}, \ \Sigma\_{22}^{-1} = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \\\Sigma\_{11}^{-1} \Sigma\_{12} &= \frac{1}{3} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 1 & 1 & 1 \\ 1 & -1 & 1 \end{bmatrix} = \frac{1}{3} \begin{bmatrix} 1 & 3 & 1 \\ 1 & -3 & 1 \end{bmatrix}, \\\Sigma\_{22}^{-1} \Sigma\_{21} &= \begin{bmatrix} 2 & -1 & 0 \\ -1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 3 \\ 0 & -2 \\ 1 & 1 \end{bmatrix}. \end{aligned}$$

Thus, the non-symmetric forms of *A* and *B* are

$$\begin{aligned} A &= \Sigma\_{11}^{-1} \Sigma\_{12} \Sigma\_{22}^{-1} \Sigma\_{21} = \frac{1}{3} \begin{bmatrix} 1 & 3 & 1 \\ 1 & -3 & 1 \end{bmatrix} \begin{bmatrix} 1 & 3 \\ 0 & -2 \\ 1 & 1 \end{bmatrix} = \frac{1}{3} \begin{bmatrix} 2 & -2 \\ 2 & 10 \end{bmatrix}, \\\ B &= \Sigma\_{22}^{-1} \Sigma\_{21} \Sigma\_{11}^{-1} \Sigma\_{12} = \frac{1}{3} \begin{bmatrix} 1 & 3 \\ 0 & -2 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 3 & 1 \\ 1 & -3 & 1 \end{bmatrix} = \frac{1}{3} \begin{bmatrix} 4 & -6 & 4 \\ -2 & 6 & -2 \\ 2 & 0 & 2 \end{bmatrix}. \end{aligned}$$

Let us compute the eigenvalues of 3*A*. Consider 2 − *λ* −2 2 10 − *λ* <sup>=</sup> <sup>0</sup> <sup>⇒</sup> *(*2−*λ)(*10<sup>−</sup> *λ)* + 4 = 0, which gives

$$\lambda = \frac{12 \pm \sqrt{(12)^2 - 4(24)}}{2} = 6 + 2\sqrt{3}, \ 6 - 2\sqrt{3},$$

the eigenvalues of *<sup>A</sup>* being *λ(*1*)* <sup>=</sup> <sup>2</sup> <sup>+</sup> <sup>2</sup> 3 <sup>√</sup>3*, λ(*2*)* <sup>=</sup> <sup>2</sup> <sup>−</sup> <sup>2</sup> 3 <sup>√</sup>3. These are the squares of the canonical correlation coefficient *ρ* resulting from (10.1.1). Let us determine the eigenvectors corresponding to *λ(*1*)* and *λ(*2*)*. Our notations for the linear functions of *X* and *Y* are *u* = *α*- *X* and *v* = *β*- *Y* ; in this case, *α*- = *(α*1*, α*2*)* and *β*- = *(β*1*, β*2*, β*3*)*. Then, the eigenvector *α*, corresponding to *λ(*1*)* is obtained from the equation

Canonical Correlation Analysis

$$
\begin{bmatrix}
\frac{2}{3} - (2 + \frac{2}{3}\sqrt{3}) & -\frac{2}{3} \\
\frac{2}{3} & \frac{10}{3} - (2 + \frac{2}{3}\sqrt{3})
\end{bmatrix}
\begin{bmatrix}
\alpha\_1 \\
\alpha\_2
\end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}
\Rightarrow \tag{i}
$$

$$
$$

Observe that since *(i)* is a singular system of linear equations, we need only consider one equation and we can preassign a value for *α*<sup>1</sup> or *α*2. Taking *α*<sup>1</sup> = 1, let us normalize the resulting vector via the constraint *α*- *Σ*11*α* = 1. Since

$$\begin{aligned} \alpha' \Sigma\_{11} \alpha &= \begin{bmatrix} 1 & -(2+\sqrt{3}) \end{bmatrix} \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ -(2+\sqrt{3}) \end{bmatrix} \\ &= 12 + 6\sqrt{3} \equiv \mathcal{y}\_{\mathbb{L}} \end{aligned}$$

the normalized *α*, denoted by *α(*1*)*, is

$$\alpha\_{\rm (l)} = \frac{1}{\sqrt{\mathcal{Y}\_{\rm 1}}} \begin{bmatrix} 1 \\ -(2 + \sqrt{3}) \end{bmatrix} \Rightarrow u\_1 = \frac{1}{\sqrt{\mathcal{Y}\_{\rm 1}}} [\mathbf{x}\_1 - (2 + \sqrt{3})\mathbf{x}\_2]. \tag{ii}$$

Now, the eigenvector corresponding to the second eigenvalue *λ(*2*)* is such that

$$
\begin{bmatrix}
\frac{2}{3} - (2 - \frac{2}{3}\sqrt{3}) & -\frac{2}{3} \\
\frac{2}{3} & \frac{10}{3} - (2 - \frac{2}{3}\sqrt{3})
\end{bmatrix}
\begin{bmatrix}
\alpha\_1 \\
\alpha\_2
\end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}
\Rightarrow
$$

$$
(-2 + \sqrt{3})\alpha\_1 - \alpha\_2 = 0 \Rightarrow \alpha\_1 = 1, \; \alpha\_2 = -2 + \sqrt{3}.
$$

Since

$$\begin{aligned} \alpha' \Sigma\_{11} \alpha &= \begin{bmatrix} 1 & -2 + \sqrt{3} \end{bmatrix} \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ -2 + \sqrt{3} \end{bmatrix} \\ &= 12 - 6\sqrt{3} \equiv \mathcal{V}\_2, \end{aligned}$$

the normalized *α* such that *α*- *Σ*11*α* = 1 is

$$\alpha\_{(2)} = \frac{1}{\sqrt{\mathcal{V}\_2}} \begin{bmatrix} 1 \\ -2 + \sqrt{3} \end{bmatrix} \Rightarrow \mu\_2 = \frac{1}{\sqrt{\mathcal{V}\_2}} [\mathbf{x}\_1 + (-2 + \sqrt{3})\mathbf{x}\_2]. \tag{iii}$$

Observe that computing the eigenvalues of *B* from the equation |*B* − *λ(j )I* | = 0 will be difficult. However, we know that they are *λ(*1*)* and *λ(*2*)* as given above, the third one being equal to zero. So, let us first verify that 3*λ(*1*)* is an eigenvalue of 3*B*. Consider

$$\begin{aligned} |3B - (6 + 2\sqrt{3})I| &= \begin{vmatrix} 4 - (6 + 2\sqrt{3}) & -6 & 4 \\ -2 & 6 - (6 + 2\sqrt{3}) & -2 \\ 2 & 0 & 2 - (6 + 2\sqrt{3}) \end{vmatrix} \\ &= 2 \times 2 \times 2 \begin{vmatrix} 1 & \sqrt{3} & 1 \\ -(1 + \sqrt{3}) & -3 & 2 \\ 1 & 0 & -(2 + \sqrt{3}) \end{vmatrix} \\ &= 8 \begin{vmatrix} 1 & \sqrt{3} & 1 \\ 0 & \sqrt{3} & 3 + \sqrt{3} \\ 0 & -\sqrt{3} & -(3 + \sqrt{3}) \end{vmatrix} = 0. \end{aligned}$$

The operations performed are the following: Taking out 2 from each row; interchanging the second and the first rows; adding *(*<sup>1</sup> <sup>+</sup> <sup>√</sup>3*)* times the first row to the second row and adding minus one times the first row to the third row. Similarly, it can be verified that *λ(*2*)* is also an eigenvalue of *B*. Moreover, since the third row of *B* is equal to the sum of its first two rows, *B* is singular, which means that the remaining eigenvalue must be zero. In Example 10.2.1, we made use of the formula resulting from *(ii)* of Sect. 10.1 for determining the second set of canonical variables. In this case, they will be directly computed from *B* to illustrate a different approach. Let us now determine the eigenvectors with respect to *B*, corresponding to *λ(*1*)* and *λ(*2*)*:

$$\begin{split} B &= \Sigma\_{22}^{-1} \Sigma\_{21} \Sigma\_{11}^{-1} \Sigma\_{12} = \frac{1}{3} \begin{bmatrix} 4 & -6 & 4 \\ -2 & 6 & -2 \\ 2 & 0 & 2 \end{bmatrix}; \\ \delta(B - \lambda\_{(1)}I)\beta &= O \\ &\Rightarrow \begin{bmatrix} \frac{4}{3} - \left(\frac{6}{3} + \frac{2}{3}\sqrt{3}\right) & -\frac{6}{3} & 4 \\ -\frac{2}{3} & \frac{6}{3} - \left(\frac{6}{3} + \frac{2}{3}\sqrt{3}\right) & -\frac{2}{3} \\ \frac{2}{3} & 0 & \frac{2}{3} - \left(\frac{6}{3} + \frac{2}{3}\sqrt{3}\right) \end{bmatrix} \begin{bmatrix} \beta\_{1} \\ \beta\_{2} \\ \beta\_{3} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} \\ &\Rightarrow \begin{bmatrix} -(1+\sqrt{3}) & -3 & 2 \\ -1 & -\sqrt{3} & -1 \\ 1 & 0 & -(2+\sqrt{3}) \end{bmatrix} \begin{bmatrix} \beta\_{1} \\ \beta\_{2} \\ \beta\_{3} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}. \end{split}$$

This yields the equations

$$-(1+\sqrt{3})\beta\_1 - 3\beta\_2 + 2\beta\_3 = 0\tag{i\nu}$$

$$-\beta\_1 - \sqrt{3}\beta\_2 - \beta\_3 = 0 \tag{\text{v}}$$

$$
\beta\_1 - (2 + \sqrt{3})\beta\_3 = 0,\\
\tag{\nu i}
$$

whose solution in terms of an arbitrary *<sup>β</sup>*<sup>3</sup> is *<sup>β</sup>*<sup>2</sup> = −*(*<sup>1</sup> <sup>+</sup> <sup>√</sup>3*)β*<sup>3</sup> and *<sup>β</sup>*<sup>1</sup> <sup>=</sup> *(*<sup>2</sup> <sup>+</sup> <sup>√</sup>3*)β*3. Taking *<sup>β</sup>*<sup>3</sup> <sup>=</sup> 1, we have the solution, *<sup>β</sup>*<sup>3</sup> <sup>=</sup> <sup>1</sup>*, β*<sup>2</sup> = −*(*<sup>1</sup> <sup>+</sup> <sup>√</sup>3*), β*<sup>1</sup> <sup>=</sup> *(*<sup>2</sup> <sup>+</sup> <sup>√</sup>3*)*. Let us normalize the resulting vector via the constraint *β*- *Σ*22*β* = 1:

$$\begin{aligned} \beta' \Sigma\_{22} \beta &= \begin{bmatrix} 1 \ 2 + \sqrt{3} & -(1 + \sqrt{3}) & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 & 0 \\ 1 & 2 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 2 + \sqrt{3} \\ -(1 + \sqrt{3}) \\ 1 \end{bmatrix} \\ &= 6 + 2\sqrt{3} = \delta\_1. \end{aligned}$$

Thus, the normalized *β*, denoted by *β(*1*)*, is

$$\beta\_{\rm (l)} = \frac{1}{\sqrt{\delta\_1}} \begin{bmatrix} 2 + \sqrt{3} \\ -(1 + \sqrt{3}) \\ 1 \end{bmatrix} \Rightarrow$$

$$\mathbf{v}\_1 = \frac{1}{\sqrt{\delta\_1}} [(2 + \sqrt{3})\mathbf{y}\_1 - (1 + \sqrt{3})\mathbf{y}\_2 + \mathbf{y}\_3]. \tag{vii}$$

Observe that we could also have utilized *(iii)* of Sect. 10.3.1 to evaluate *β(*1*)* and *β(*2*)* from *α(*1*)* and *α(*2*)*. Consider the second eigenvalue *λ(*2*)* <sup>=</sup> <sup>6</sup> <sup>3</sup> <sup>−</sup> <sup>2</sup> 3 <sup>√</sup>3 and the equation *(B* − *λ(*2*)I )β* = *O*, that is,

$$
\begin{split}
\begin{bmatrix}
\frac{4}{3} - (\frac{6}{3} - \frac{2}{3}\sqrt{3}) & -\frac{6}{3} & \frac{4}{3} \\
\frac{2}{3} & 0 & \frac{2}{3} - (\frac{6}{3} - \frac{2}{3}\sqrt{3})
\end{bmatrix}
\begin{bmatrix}
\beta\_{1} \\
\beta\_{2} \\
\beta\_{3}
\end{bmatrix} = \begin{bmatrix}
0 \\
0 \\
0
\end{bmatrix} \Rightarrow \\
\begin{bmatrix}
1 & 0 & -2 + \sqrt{3}
\end{bmatrix}
\begin{bmatrix}
\beta\_{1} \\
\beta\_{2} \\
\beta\_{3}
\end{bmatrix} = \begin{bmatrix}
0 \\
0 \\
0
\end{bmatrix},
\end{split}
$$

which leads to the equations

$$(-1 + \sqrt{3})\beta\_1 - 3\beta\_2 + 2\beta\_3 = 0\tag{viii}$$

$$-\beta\_1 + \sqrt{3}\beta\_2 - \beta\_3 = 0\tag{ix}$$

$$
\beta\_1 + (-2 + \sqrt{3})\beta\_3 = 0.\tag{x}
$$

Thus, when *<sup>β</sup>*<sup>3</sup> <sup>=</sup> 1, *<sup>β</sup>*<sup>1</sup> <sup>=</sup> <sup>2</sup>−√3 and *<sup>β</sup>*<sup>2</sup> <sup>=</sup> <sup>√</sup>3−1. Subject to the constraint *<sup>β</sup>*- *Σ*22*β* = 1, we have

$$\begin{aligned} \beta' \Sigma\_{22} \beta &= \begin{bmatrix} (2 - \sqrt{3}) & (\sqrt{3} - 1) & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 & 0 \\ 1 & 2 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 2 - \sqrt{3} \\ \sqrt{3} - 1 \\ 1 \end{bmatrix} \\ &= 6 - 2\sqrt{3} = \delta\_2. \end{aligned}$$

Hence, the normalized *β* is

$$\beta\_{(2)} = \frac{1}{\sqrt{\delta\_2}} \begin{bmatrix} 2 - \sqrt{3} \\ \sqrt{3} - 1 \\ 1 \end{bmatrix} \Rightarrow$$

$$v\_2 = \frac{1}{\sqrt{\delta\_2}} [(2 - \sqrt{3})\mathbf{y}\_1 + (\sqrt{3} - 1)\mathbf{y}\_2 + \mathbf{y}\_3]. \tag{xi}$$

The reader may also verify that this solution for *β(*2*)* is identical to that coming from *(iii)* of Sect. 10.3.1. Thus, the canonical pairs are the following: From *(ii)* and (*vii*), we have the first canonical pair *(u*1*, v*1*)*, the second pair *(u*2*, v*2*)* resulting from *(iii)* and *(xi)*. This means *u*<sup>1</sup> is the best predictor of *v*<sup>1</sup> and vice versa, and that *u*<sup>2</sup> is the second best predictor of *v*<sup>2</sup> and vice versa.

Let us ensure that no computational errors have been committed. Consider

$$\begin{aligned} \alpha'\_{(1)}\Sigma\_{12}\beta\_{(1)} &= \frac{1}{\sqrt{\gamma\_1 \delta\_1}} [1, -(2+\sqrt{3})] \begin{bmatrix} 1 & 1 & 1 \\ 1 & -1 & 1 \end{bmatrix} \begin{bmatrix} 2+\sqrt{3} \\ -(1+\sqrt{3}) \\ 1 \end{bmatrix} \\ &= \frac{1}{\sqrt{\gamma\_1 \delta\_1}} 4(3+2\sqrt{3}), \end{aligned}$$

with

$$
\sqrt{\nu}\_1 \delta\_1 = 6(2+\sqrt{3})2(3+\sqrt{3}) = 12(9+5\sqrt{3}),
$$

so that

$$\begin{aligned} \frac{[\alpha'\_{(1)}\Sigma\_1\beta\_{(1)}]^2}{\chi\_1\delta\_1} &= \frac{16(3+2\sqrt{3})^2}{12(9+5\sqrt{3})}\\ &= \frac{16(21+12\sqrt{3})}{12(9+5\sqrt{3})} = \frac{4(7+4\sqrt{3})}{9+3\sqrt{3}} = \frac{4(7+4\sqrt{3})(9-5\sqrt{3})}{6} \\ &= \frac{2}{3}(3+\sqrt{3}) = 2 + \frac{2}{\sqrt{3}} = 2 + \frac{2}{3}\sqrt{3} = \lambda\_{(1)} \text{ : the largest eigenvalue of } A, \end{aligned}$$

which corroborates the results obtained for *α(*1*), β(*1*)* and *λ(*1*)*. Similarly, it can be verified that *α(*2*), β(*2*)* and *λ(*2*)* have been correctly computed.

#### **10.4. The Sampling Distribution of the Canonical Correlation Matrix**

Consider a simple random sample of size *n* from *Z* = *X Y* . Let the *(p* + *q)* × *(p* + *q)* sample sum of products matrix be denoted by *S* and let *Z* have a real *(p* + *q)*-variate standard Gaussian density. Then *S* has a real *(p* + *q)*-variate Wishart distribution with the identity matrix as its parameter matrix and *m* = *n* − 1 degrees of freedom, *n* being the sample size. Letting the density of *S* be denoted by *f (S)*,

$$f(S) = \frac{|S|^{\frac{m}{2} - \frac{p+q-1}{2}}}{2^{\frac{m(p+q)}{2}} \Gamma\_{p+q}(\frac{m}{2})} \mathbf{e}^{-\frac{1}{2} \text{tr}(S)}, \ S > O, \ m \ge p+q. \tag{10.4.1}$$

Let us partition *S* as follows:

$$\mathbf{S} = \begin{bmatrix} \mathbf{S}\_{11} & \mathbf{S}\_{12} \\ \mathbf{S}\_{21} & \mathbf{S}\_{22} \end{bmatrix}, \text{  $\mathbf{S}\_{11}$  is  $p \times p$ ,  $\mathbf{S}\_{22}$  is  $q \times q$ ,  $t$ }$$

and let d*S* = d*S*<sup>11</sup> ∧ d*S*<sup>22</sup> ∧ d*S*12. Note that tr*(S)* = tr*(S*11*)* + tr*(S*22*)* and

$$\begin{aligned} |S| &= |S\_{22}| \, |S\_{11} - S\_{12} S\_{22}^{-1} S\_{21}| \\ &= |S\_{22}| \, |S\_{11}| \, |I - S\_{11}^{-\frac{1}{2}} S\_{12} S\_{22}^{-1} S\_{21} S\_{11}^{-\frac{1}{2}}| .\end{aligned}$$

Letting *U* = *S* −1 2 <sup>11</sup> *S*12*S* −1 2 <sup>22</sup> ]*,* d*U* = |*S*11| −*q* <sup>2</sup> |*S*22| −*p* <sup>2</sup> d*S*<sup>12</sup> for fixed *S*<sup>11</sup> and *S*22, so that the joint density of *S*11*, S*<sup>22</sup> and *S*<sup>12</sup> is given by

$$f\_1(S) \text{d}S\_{11} \wedge \text{d}S\_{22} \wedge \text{d}S\_{12} = \frac{|S\_{11}|^{\frac{m}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(S\_{11})}}{2^{\frac{mp}{2}} \Gamma\_p(\frac{m}{2})} \text{d}S\_{11}$$

$$\times \frac{|S\_{22}|^{\frac{m}{2} - \frac{q+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(S\_{22})}}{2^{\frac{mq}{2}} \Gamma\_q(\frac{m}{2})} \text{d}S\_{22}$$

$$\times \frac{\Gamma\_p(\frac{m}{2}) \Gamma\_q(\frac{m}{2})}{\Gamma\_{p+q}(\frac{m}{2})} |I - UU'|^{\frac{m}{2} - \frac{p+q-1}{2}} \text{d}U. \tag{10.4.2}$$

It is seen from (10.4.2) that *S*11*, S*<sup>22</sup> and *U* are mutually independently distributed, and so are *S*11*, S*<sup>22</sup> and *W* = *UU*- *.* Further, *S*<sup>11</sup> ∼ *Wp(m, I )* and *S*<sup>22</sup> ∼ *Wq (m, I )*. Note that *W* = *UU*- = *S* −1 2 <sup>11</sup> *<sup>S</sup>*12*S*−<sup>1</sup> <sup>22</sup> *S*21*S* −1 2 <sup>11</sup> is the sample canonical correlation matrix. It follows from Theorem 4.2.3 of Chapter 4, that for *p* ≤ *q* and *U* of full rank *p*,

$$\mathrm{d}U = \frac{\pi^{\frac{pq}{2}}}{\Gamma\_p(\frac{q}{2})} |W|^{\frac{q}{2} - \frac{p+1}{2}} \mathrm{d}W. \tag{10.4.3}$$

After integrating out *S*<sup>11</sup> and *S*<sup>22</sup> from (10.4.2) and substituting for d*S*12, we obtain the following representation of the density of *W*:

$$f\_2(W) = \frac{\pi^{\frac{pq}{2}}}{\Gamma\_p(\frac{q}{2})} \frac{\Gamma\_p(\frac{m}{2})\Gamma\_q(\frac{m}{2})}{\Gamma\_{p+q}(\frac{m}{2})} |W|^{\frac{q}{2} - \frac{p+1}{2}} |I - W|^{\frac{m-q}{2} - \frac{p+1}{2}},$$

where

$$\frac{\Gamma\_q(\frac{m}{2})}{\Gamma\_{p+q}(\frac{m}{2})} = \frac{\pi^{\frac{q(q-1)}{4}}}{\pi^{\frac{(p+q)(p+q-1)}{4}}} \frac{\Gamma(\frac{m}{2}) \cdots \Gamma(\frac{m}{2} - \frac{q-1}{2})}{\Gamma(\frac{m}{2}) \cdots \Gamma(\frac{m}{2} - \frac{p+q-1}{2})} = \frac{1}{\pi^{\frac{pq}{2}} \Gamma\_p(\frac{m-q}{2})}.$$

Hence, the density of *W* is

$$f\_2(W) = \frac{\Gamma\_p(\frac{m}{2})}{\Gamma\_p(\frac{q}{2})\Gamma\_p(\frac{m-q}{2})} |W|^{\frac{q}{2} - \frac{p+1}{2}} |I - W|^{\frac{m-q}{2} - \frac{p+1}{2}}.\tag{10.4.4}$$

Thus, the following result:

**Theorem 10.4.1.** *Let Z, S, S*11*, S*22*, S*12*, U and W be as defined above. Then, for p* ≤ *q and U of full rank p, the p* × *p canonical correlation matrix W* = *UU has the real matrix-variate type-1 beta density with the parameters ( <sup>q</sup>* <sup>2</sup> *, <sup>m</sup>*−*<sup>q</sup>* <sup>2</sup> *) that is specified in (10.4.4) with m* ≥ *p* + *q, m* = *n* − 1*.*

When *q* ≤ *p* and *S*<sup>21</sup> is of full rank *q*, the canonical correlation matrix *W*<sup>1</sup> = *U*- *U* = *S* −1 2 <sup>22</sup> *<sup>S</sup>*21*S*−<sup>1</sup> <sup>11</sup> *S*12*S* −1 2 <sup>22</sup> will have the density given in (10.4.4) with *p* and *q* interchanged. Suppose that *p* ≤ *q* and we would like to consider the density of *W*<sup>1</sup> = *U*- *U*. In this case *U*- *U* is real positive semi-definite as the rank of *U* is *p* ≤ *q*. However, on expanding the following determinant in two different ways:

$$
\begin{vmatrix} I\_p & U \\ U' & I\_q \end{vmatrix} = |I\_p - UU'| = |I\_q - U'U|,
$$

it follows from (10.4.2) that the *q* × *q* matrix *U*- *U* has a distribution that is equivalent to that of the *p*×*p* matrix *UU*- , as given in (10.4.4). The distribution of the sample canonical correlation matrix has been derived in Mathai (1981) for a Gaussian population under the assumption that *Σ*<sup>12</sup> = *O*.

#### **10.4.1. The joint density of the eigenvalues and eigenvectors**

Without any loss of generality, let *p* ≤ *q* and *U* be of full rank *p*. Let *W* denote the sample canonical correlation matrix whose density is as given in (10.4.4) for the case when the population canonical matrix is a null matrix. Let the eigenvalues of *W* be distinct and such that 1 *> ν*<sup>1</sup> *> ν*<sup>2</sup> *<sup>&</sup>gt;* ··· *> νp <sup>&</sup>gt;* 0. Observe that *νj* <sup>=</sup> *<sup>r</sup>*<sup>2</sup> *(j )* where *r(j ), j* = 1*,...,p* are the sample canonical correlations. For a unique *p* × *p* orthonormal matrix *Q*, *QQ*- = *I, Q*- *Q* = *I* , we have *Q*- *WQ* = diag*(ν*1*,...,νp)* ≡ *D*. Consider the transformation from *W* to *D* and the normalized eigenvectors of *W*, which constitute the columns of *Q*. Then, as is explained in Theorem 8.2.1 or Theorem 4.4 of Mathai (1997),

$$\mathrm{d}W = \left[\prod\_{i$$

where *h(Q)* = ∧[*(*d*Q)Q*- ] is the differential element associated with *Q*, and we have the following result:

**Theorem 10.4.2.** *The joint density of the distinct eigenvalues* 1 *> ν*<sup>1</sup> *> ν*<sup>2</sup> *>* ··· *> νp >* 0*, p* ≤ *q, of W* = *UU whose density is specified in (10.4.4), U being assumed to be of full rank p, and the normalized eigenvectors corresponding to ν*1*,...,νp, denoted by f*3*(D, Q), is the following:*

$$f\_{\mathbf{\hat{J}}}(D,\mathcal{Q})\mathrm{d}D \wedge h(\mathcal{Q}) = \frac{\Gamma\_p(\frac{m}{2})}{\Gamma\_p(\frac{q}{2})\Gamma\_p(\frac{m-q}{2})} \left[\prod\_{j=1}^p \nu\_j^{\frac{q}{2}-\frac{p+1}{2}}\right] \left[\prod\_{j=1}^p (1-\nu\_j)^{\frac{m-q}{2}-\frac{p+1}{2}}\right]$$

$$\times \left[\prod\_{i$$

*where h(Q) is as defined in (10.4.5). To obtain the joint density of the squares of the sample canonical correlations r*<sup>2</sup> *(j ) and the corresponding canonical vectors, it suffices to replace νj by r*<sup>2</sup> *(j ),* <sup>1</sup> *> r*<sup>2</sup> *(*1*) <sup>&</sup>gt;* ··· *> r*<sup>2</sup> *(p) >* 0*,* − 1 *< r(j ) <* 1*, j* = 1*,...,p.*

The joint density of the eigenvalues can be determined by integrating out *h(Q)* from (10.4.6) in this real case. It follows from Theorem 4.2.2 that

$$\int\_{O\_p} h(Q) = \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})},\tag{10.4.7}$$

this result being also stated in Mathai (1997). For the complex case, the expression on the right-hand side of (10.4.7) is *πp(p*−1*) /Γ*˜ *p(p)*. Hence, the joint density of the eigenvalues or, equivalently, the density of *D* and the density of *Q* are the following:

**Theorem 10.4.3.** *When p* ≤ *q and U is of full rank p, the joint density of the distinct eigenvalues* 1 *> ν*<sup>1</sup> *>* ··· *> νp >* 0 *of the canonical correlation matrix W in (10.4.4), which is available from (10.4.6) and denoted by f*4*(D), is*

$$f\_4(D) = \frac{\Gamma\_p(\frac{m}{2})}{\Gamma\_p(\frac{q}{2})\Gamma\_p(\frac{m-q}{2})} \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} \left[\prod\_{j=1}^p \nu\_j^{\frac{q}{2} - \frac{p+1}{2}}\right]$$

$$\times \left[\prod\_{j=1}^p (1-\nu\_j)^{\frac{m-q}{2} - \frac{p+1}{2}}\right] \left[\prod\_{i$$

*and the joint density of the normalized eigenvectors associated with W, denoted by f*5*(Q), is given by*

$$f\_{\mathfrak{H}}(\mathcal{Q}) = \frac{\Gamma\_p(\frac{p}{2})}{\pi^{\frac{p^2}{2}}} h(\mathcal{Q}) \tag{10.4.9}$$

*where h(Q) is as defined in (10.4.5).*

To obtain the joint density of the squares of the sample canonical correlations *r*<sup>2</sup> *(j )*, one should replace *νj* by *r*<sup>2</sup> *(j ), j* = 1*,* 2*,... , p,* in (10.4.8).

**Example 10.4.1.** Verify that (10.4.8) is a density for *p* = 2*, m* − *q* = *p* + 1*,* with *q* being a free parameter.

**Solution 10.4.1.** For *m* − *q* = *p* + 1*, p* = 2*,* the right-hand side of (10.4.8) becomes

$$\begin{split} \frac{\Gamma\_p(\frac{q+p+1}{2})}{\Gamma\_p(\frac{q}{2})\Gamma\_p(\frac{p+1}{2})} \frac{\pi^{\frac{p^2}{2}}}{\Gamma\_p(\frac{p}{2})} (\nu\_1 \nu\_2)^{\frac{q-(p+1)}{2}} (\nu\_1 - \nu\_2) \\ = \frac{\Gamma\_2(\frac{q+3}{2})}{\Gamma\_2(\frac{q}{2})\Gamma\_2(\frac{p+1}{2})} \frac{\pi^2}{\Gamma\_2(1)} (\nu\_1 \nu\_2)^{\frac{m-3}{2}} (\nu\_1 - \nu\_2). \end{split} \tag{i}$$

The constant part simplifies as follows:

$$\frac{\Gamma\_2(\frac{q+3}{2})}{\Gamma\_2(\frac{q}{2})\Gamma\_2(\frac{p+1}{2})} \frac{\pi^2}{\Gamma\_2(1)} = \frac{\Gamma(\frac{q+3}{2})\Gamma(\frac{q+2}{2})}{\sqrt{\pi}[\Gamma(\frac{q}{2})\Gamma(\frac{q-1}{2})][\Gamma(\frac{3}{2})\Gamma(1)]} \frac{\pi^2}{\sqrt{\pi}\Gamma(1)\Gamma(\frac{1}{2})};\qquad \text{(ii)}$$

now, noting that

$$
\Gamma\left(\frac{q+3}{2}\right) = \left(\frac{q+1}{2}\right)\left(\frac{q-1}{2}\right)\Gamma\left(\frac{q-1}{2}\right) \text{ and } \Gamma\left(\frac{q+2}{2}\right) = \frac{q}{2}\Gamma\left(\frac{q}{2}\right).
$$

and substituting these values in *(ii)*, the constant part becomes

$$\frac{(\frac{q+1}{2})(\frac{q-1}{2})(\frac{q}{2})}{\sqrt{\pi}(\frac{1}{2})\sqrt{\pi}}\frac{\pi^2}{\sqrt{\pi}\sqrt{\pi}}=\frac{(q-1)q(q+1)}{4}.\tag{iii}$$

Let us show that the total integral equals 1. The integral part is the following:

$$\begin{split} \int\_{\upsilon\_{1}=0}^{1} \int\_{\upsilon\_{2}=0}^{\upsilon\_{1}} (\upsilon\_{1}\upsilon\_{2})^{\frac{q-3}{2}} (\upsilon\_{1}-\upsilon\_{2}) \mathrm{d}\upsilon\_{1} \wedge \mathrm{d}\upsilon\_{2} \\ &= \int\_{0}^{1} \upsilon\_{1}^{\frac{q-1}{2}} \left[ \int\_{\upsilon\_{2}=0}^{\upsilon\_{1}} \upsilon\_{2}^{\frac{q-3}{2}} \mathrm{d}\upsilon\_{2} \right] \mathrm{d}\upsilon\_{1} - \int\_{0}^{1} \upsilon\_{1}^{\frac{q-3}{2}} \left[ \int\_{\upsilon\_{2}=0}^{\upsilon\_{1}} \upsilon\_{2}^{\frac{q-1}{2}} \mathrm{d}\upsilon\_{2} \right] \mathrm{d}\upsilon\_{1} \\ &= \int\_{0}^{1} \frac{\upsilon\_{1}^{q-1}}{(\frac{q-1}{2})} \mathrm{d}\upsilon\_{1} - \int\_{0}^{1} \frac{\upsilon\_{1}^{q-1}}{(\frac{q+1}{2})} \mathrm{d}\upsilon\_{1} = \frac{1}{q(\frac{q-1}{2})} - \frac{1}{q(\frac{q+1}{2})} \\ &= \frac{4}{(q-1)q(q+1)}. \end{split}$$

The product of *(iii)* and *(iv)* being equal to 1, this verifies that (10.4.8) is a density for *m* − *q* = *p* + 1*, p* = 2. This completes the computations.

#### **10.4.2. Testing whether the population canonical correlations equal zero**

In its symmetric form, whenever *Σ*<sup>12</sup> = *O*, the population canonical correlation matrix is a null matrix, that is, *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> = *O*. Thus, when *Σ*<sup>12</sup> = *O,* the canonical correlations are equal to zero and vice versa. As was explained in Sect. 6.8.2, we have a one-to-one function of *u*4, the likelihood ratio criterion for testing this hypothesis in the case of a Gaussian distributed population. It was established that

$$\mu\_4 = \frac{|S|}{|S\_{11}| \, |S\_{22}|} = |I - S\_{11}^{-\frac{1}{2}} S\_{12} S\_{22}^{-1} S\_{21} S\_{11}^{-\frac{1}{2}}| = |I - UU'| = |I - W| = \prod\_{j=1}^p (1 - r\_{(j)}^2) \tag{10.4.10}$$

where *r(j ), j* = 1*,... , p,* are the sample canonical correlations. It can also be seen from (10.4.2) that, when *U* is of full rank *p*, *U* has a rectangular matrix-variate type-1 beta distribution and *W* = *UU* has a real matrix-variate type-1 beta distribution. Since it has been determined in Sect. 6.8.2, that under *Ho*, the *h*-th moment of *u*<sup>4</sup> for an arbitrary *h* is given by

$$E[\boldsymbol{u}\_4^h | \boldsymbol{H}\_o] = c \frac{\prod\_{j=p+1}^{p+q} \Gamma(\frac{m}{2} - \frac{j-1}{2} + h)}{\prod\_{j=1}^q \Gamma(\frac{m}{2} - \frac{j-1}{2} + h)}, \ m = n - 1,\tag{10.4.11}$$

where *<sup>n</sup>* is the sample size and *<sup>c</sup>* is such that *<sup>E</sup>*[*u*<sup>0</sup> <sup>4</sup>|*Ho*] = 1, the density of *u*<sup>4</sup> is expressible in terms of a G-function. It was also shown in the same section that −*n* ln *u*<sup>4</sup> is asymptotically distributed as a real chisquare random variable having *(p*+*q)(p*+*q*−1*)* <sup>2</sup> <sup>−</sup> *p(p*−1*)* <sup>2</sup> <sup>−</sup> *q(q*−1*)* <sup>2</sup> = *p q* degrees of freedom, which corresponds to the number of parameters restricted by the hypothesis *Σ*<sup>12</sup> = *O* since there are *p q* free parameters in *Σ*12. Thus, the following result:

**Theorem 10.4.4.** *Consider the hypothesis Ho* : *ρ(*1*)* =···= *ρ(p)* = 0*, that is, the population canonical correlations ρ(j ), j* = 1*,... , p, are all equal to zero, which is equivalent to the hypothesis Ho* : *Σ*<sup>12</sup> = *O. Let u*<sup>4</sup> *denote the (*2*/n)-th root of the likelihood ratio criterion for testing this hypothesis. Then, as the sample size n* → ∞*, under Ho,*

$$-n\ln\mu\_4 = -2\ln(the\ likelihood\ ratio\ criterion) \rightarrow \chi^2\_{pq},\tag{10.4.12}$$

*χ*2 *<sup>ν</sup> denoting a real chisquare random variable having ν degrees of freedom.*

An illustrative numerical example has already been presented in Chap. 6.

**Note 10.1.** We have initially assumed that *Σ > O, Σ*<sup>11</sup> *> O* and *Σ*<sup>22</sup> *> O*. However, *Σ*<sup>12</sup> = *Σ*- <sup>21</sup> may or may not be of full rank or some of its elements could be equal to zero. Note that *Σ*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*<sup>21</sup> is either positive definite or positive semi-definite. Whenever *<sup>p</sup>* <sup>≤</sup> *<sup>q</sup>* and *<sup>Σ</sup>*<sup>12</sup> is of rank *<sup>p</sup>*, *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*<sup>21</sup> *> O* and, in this instance, all the *p* canonical correlations are positive. If *Σ*<sup>12</sup> is not of full rank, then some of the eigenvalues of *W* as previously defined, as well as the corresponding canonical correlations will be equal to zero and, in the event that *q* ≤ *p*, similar statements would hold with respect to *Σ*21, *W*- and the resulting canonical correlations. This aspect will not be further investigated from an inferential standpoint.

**Note 10.2.** Consider the regression of *X* on *Y* , that is, *E*[*X*|*Y* ], when *Z* = *X Y* has the following real *(p* + *q)*-variate normal distribution:

$$\begin{aligned} Z &\sim N\_{p+q}(\mu, \Sigma), & \Sigma > O, \ \Sigma = \begin{bmatrix} \Sigma\_{11} & \Sigma\_{12} \\ \Sigma\_{21} & \Sigma\_{22} \end{bmatrix}, \\\ \Sigma\_{11} = \text{Cov}(X) &> O \text{ is } p \times p, \ \Sigma\_{22} = \text{Cov}(Y) > O \text{ is } q \times q. \end{aligned}$$

Then, from equation (3.3.5), we have

$$E[X|Y] = \mu\_{(\mathbf{l})} + \Sigma\_{\mathbf{l}2} \Sigma\_{22}^{-1} (Y - \mu\_{(\mathbf{2})})$$

where *μ*- = *(μ*- *(*1*) , μ*- *(*2*) )* and

$$\text{Cov}(X|Y) = \Sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}... $$

Regression analysis is performed on the conditional space where *Y* is either composed of non-random real scalar variables or given values of real scalar random variables, whereas canonical correlation analysis is carried out in the entire space of *Z*. Clearly, these techniques involve distinct approaches. When *Y* is given values of random variables, then *Σ*<sup>12</sup> and *Σ*<sup>22</sup> can make sense. In this instance, the hypothesis *Ho* : *Σ*<sup>12</sup> = *O*, in which case the regression coefficient matrix is a null matrix or, equivalently, the hypothesis that *Y* does not contribute to predicting *X*, implies that the canonical correlation matrix *Σ*−<sup>1</sup> 2 <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> is as well a null matrix. Accordingly, in this case, the 'no regression' hypothesis *Σ*<sup>12</sup> = *O* (no contribution of *Y* in predicting *X*) is equivalent to the hypothesis that the canonical correlations are equal to zero and vice versa.

#### **10.5. The General Sampling Distribution of the Canonical Correlation Matrix**

Let the *(p* + *q)* × 1 real vector random variable *Z* = *X Y* ∼ *Np*+*q(μ, Σ), Σ > O*. Consider a simple random sample of size *n* from this Gaussian population and let the sample sum of products matrix *S* be partitioned as in the preceding section. Let the sample canonical correlation matrix be denoted by *R* and the corresponding population canonical correlation, by *P*, that is,

$$R = S\_{11}^{-\frac{1}{2}} S\_{12} S\_{22}^{-1} S\_{21} S\_{11}^{-\frac{1}{2}} \text{ and } \; P = \Sigma\_{11}^{-\frac{1}{2}} \Sigma\_{12} \Sigma\_{22}^{-1} \Sigma\_{21} \Sigma\_{11}^{-\frac{1}{2}}.$$

We now examine the distribution of *R,* assuming that *P* = *O.* Letting the determinant of *I* − *R* be denoted by *u*, we have

$$\mu = |I - R| = |S\_{11} - S\_{12}S\_{22}^{-1}S\_{21}||S\_{11}|^{-1} = \frac{|S|}{|S\_{11}||S\_{22}|}.$$

Thus, the *h*-th moment of *u* is

$$E[\boldsymbol{\mu}^{\hbar}] = E\left[\frac{|S|}{|S\_{11}| \ |S\_{22}|}\right]^{\hbar} = E\{ |S|^{\hbar} |S\_{11}|^{-\hbar} |S\_{22}|^{-\hbar} \}.$$

Since *S, S*<sup>11</sup> and *S*<sup>22</sup> are functions of *S*, we can integrate out over the density of *S*, namely the Wishart density with *m* = *n* − 1 degrees of freedom and parameter matrix *Σ>O*. Then for *m* ≥ *p* + *q*,

$$E[\mu^h] = \frac{1}{|2\Sigma|^{\frac{m}{2}}\Gamma\_{p+q}(\frac{m}{2})} \int\_{S>0} \left[\frac{|S|}{|S\_{11}|\,|S\_{22}|}\right]^h |S|^{\frac{m}{2} - \frac{p+q+1}{2}} \mathbf{e}^{-\frac{1}{2}\text{tr}(\Sigma^{-1}S)} d\mathcal{S}.\tag{10.5.1}$$

Let us substitute *S* to <sup>1</sup> <sup>2</sup>*S* so that 2 will vanish from the factors containing 2, and let us replace |*S*11| <sup>−</sup>*<sup>h</sup>* and <sup>|</sup>*S*22<sup>|</sup> <sup>−</sup>*<sup>h</sup>* by equivalent integrals:

$$\begin{aligned} |S\_{11}|^{-h} &= \frac{1}{\Gamma\_p(h)} \int\_{Y\_1 > O} |Y\_1|^{h - \frac{p+1}{2}} \mathbf{e}^{-\text{tr}(Y\_2 S\_{11})} \mathbf{d}Y\_1, \ \Re(h) > \frac{p-1}{2};\\ |S\_{22}|^{-h} &= \frac{1}{\Gamma\_q(h)} \int\_{Y\_2 > O} |Y\_2|^{h - \frac{q+1}{2}} \mathbf{e}^{-\text{tr}(Y\_2 S\_{22})} \mathbf{d}Y\_2, \ \Re(h) > \frac{q-1}{2}. \end{aligned}$$

Then,

$$E[\boldsymbol{u}^{h}] = \frac{1}{|\boldsymbol{\Sigma}|^{\frac{n}{2}}\Gamma\_{p+q}(\frac{m}{2})}\frac{1}{\Gamma\_{p}(h)\Gamma\_{q}(h)}\int\_{\boldsymbol{Y}\_{1}>0}\int\_{\boldsymbol{Y}\_{2}>0}|\boldsymbol{Y}\_{1}|^{h-\frac{p+1}{2}}|\boldsymbol{Y}\_{2}|^{h-\frac{q+1}{2}}\int\_{\boldsymbol{S}>0}|\boldsymbol{S}|^{\frac{n}{2}+h-\frac{p+q+1}{2}}\\ \times \operatorname{e}^{-\operatorname{tr}(\boldsymbol{\Sigma}^{-1}\boldsymbol{S}+\boldsymbol{Y}\boldsymbol{S})}\operatorname{d}\boldsymbol{S}\wedge\operatorname{d}\boldsymbol{Y}\_{1}\wedge\operatorname{d}\boldsymbol{Y}\_{2}\qquad(10.5.2)$$

where

$$\operatorname{tr}(YS) = \operatorname{tr}\left\{ \begin{pmatrix} Y\_1 & O \\ O & Y\_2 \end{pmatrix} \begin{pmatrix} S\_{11} & S\_{12} \\ S\_{21} & S\_{22} \end{pmatrix} \right\} = \operatorname{tr}(Y\_1S\_{11}) + \operatorname{tr}(Y\_2S\_{22}), \ Y = \begin{pmatrix} Y\_1 & O \\ O & Y\_2 \end{pmatrix}.$$

Integrating over *S* in (10.5.2) gives

$$\begin{split} E[u^{h}] = \frac{\Gamma\_{p+q}(\frac{m}{2}+h)}{\Gamma\_{p+q}(\frac{m}{2})} \frac{1}{|\boldsymbol{\Sigma}|^{\frac{m}{2}}\Gamma\_{p}(h)\Gamma\_{q}(h)} \int\_{\boldsymbol{Y}\_{1}\geqslant\boldsymbol{O}} \int\_{\boldsymbol{Y}\_{2}\geqslant\boldsymbol{O}} |\boldsymbol{Y}\_{1}|^{h-\frac{p+1}{2}} |\boldsymbol{Y}\_{2}|^{h-\frac{q+1}{2}} \\ & \qquad \times |\boldsymbol{\Sigma}^{-1}+\boldsymbol{Y}|^{-(\frac{m}{2}+h)} \mathrm{d}\boldsymbol{Y}\_{1}\geqslant\mathrm{d}\boldsymbol{Y}\_{2}. \end{split}$$

Let

$$
\Sigma^{-1} = \begin{bmatrix} \Sigma^{11} & \Sigma^{12} \\ \Sigma^{21} & \Sigma^{22} \end{bmatrix} \Rightarrow \begin{bmatrix} \Sigma^{-1} + Y \end{bmatrix} = \begin{bmatrix} \Sigma^{11} + Y\_1 & \Sigma^{12} \\ \Sigma^{21} & \Sigma^{22} + Y\_2 \end{bmatrix}.
$$

Then, the determinant can be expanded as follows:

$$\begin{aligned} |\Sigma^{-1} + Y| &= |\Sigma^{22} + Y\_2| \, |\Sigma^{11} + Y\_1 - \Sigma^{12} (\Sigma^{22} + Y\_2)^{-1} \Sigma^{21}| \\ &= |\Sigma^{22} + Y\_2| \, |Y\_1 + B|, \; B = \Sigma^{11} - \Sigma^{12} (\Sigma^{22} + Y\_2)^{-1} \Sigma^{21}, \end{aligned}$$

so that

$$|\Sigma^{-1} + Y|^{-(\frac{m}{2} + h)} = |\Sigma^{22} + Y\_2|^{-(\frac{m}{2} + h)} |I + \mathcal{B}^{-1} Y\_1|^{-(\frac{m}{2} + h)} |\mathcal{B}|^{-(\frac{m}{2} + h)}.$$

Collecting the factors containing *Y*<sup>1</sup> and integrating out, we have

$$\frac{1}{\Gamma\_p(h)} \int\_{Y\_1 > O} |Y\_1|^{h - \frac{p+1}{2}} |I + \mathcal{B}^{-1} Y\_1|^{-(\frac{m}{2} + h)} \mathrm{d}Y\_1 = \frac{\Gamma\_p(\frac{m}{2})}{\Gamma\_p(\frac{m}{2} + h)} |\mathcal{B}|^h, \ \Re(h) > \frac{p-1}{2},$$

and |*B*| <sup>−</sup>*( <sup>m</sup>* <sup>2</sup> <sup>+</sup>*h)*|*B*<sup>|</sup> *<sup>h</sup>* = |*B*<sup>|</sup> <sup>−</sup> *<sup>m</sup>* <sup>2</sup> . Noting that

$$\begin{vmatrix} \Sigma^{11} & \Sigma^{12} \\ \Sigma^{21} & \Sigma^{22} + Y\_2 \end{vmatrix} = \begin{cases} |\Sigma^{11}| \, |\Sigma^{22} + Y\_2 - \Sigma^{21} (\Sigma^{11})^{-1} \Sigma^{12} | \\ |\Sigma^{22} + Y\_2| \, |\Sigma^{11} - \Sigma^{12} (\Sigma^{22} + Y\_2)^{-1} \Sigma^{21} | = |\Sigma^{22} + Y\_2| \, |\mathcal{B}|, \end{cases}$$


$$|B| = \frac{|\Sigma^{11}| |Y\_2 + C|}{|\Sigma^{22} + Y\_2|}, \ C = \Sigma^{22} - \Sigma^{21} (\Sigma^{11})^{-1} \Sigma^{12} = \Sigma\_{22}^{-1}, \tag{i}$$

so that,

$$|B|^{-\frac{m}{2}} = |\Sigma^{11}|^{-\frac{m}{2}} |Y\_2 + \Sigma^{-1}\_{22}|^{-\frac{m}{2}} |Y\_2 + \Sigma^{22}|^{\frac{m}{2}} \Rightarrow$$

$$|B|^{-\frac{m}{2}} |Y + \Sigma^{22}|^{-(\frac{m}{2} + h)} = |\Sigma^{11}|^{-\frac{m}{2}} |Y\_2 + \Sigma^{-1}\_{22}|^{-\frac{m}{2}} |Y\_2 + \Sigma^{22}|^{-h} \,. \tag{ii}$$

Collecting all the factors containing *Y*<sup>2</sup> and integrating out, we have the following:

$$\begin{split} \frac{1}{\Gamma\_{q}(h)} \int\_{Y\_{2>O}} |Y\_{2}|^{h-\frac{q+1}{2}} |Y\_{2} + \Sigma\_{22}^{-1}|^{-\frac{m}{2}} |Y\_{2} + \Sigma^{22}|^{-h} \mathrm{d}Y\_{2} \\ = \frac{|\Sigma\_{22}|^{\frac{m}{2}+h}}{\Gamma\_{q}(h)} \int\_{Y\_{2}>O} |Y\_{2}|^{h-\frac{q+1}{2}} |I + \Sigma\_{22}^{\frac{1}{2}} Y\_{2} \Sigma\_{22}^{\frac{1}{2}}|^{-\frac{n}{2}} \\ \qquad \qquad \qquad \times |\Sigma\_{22}^{\frac{1}{2}} Y\_{2} \Sigma\_{22}^{\frac{1}{2}} + \Sigma\_{22}^{\frac{1}{2}} \Sigma^{22} \Sigma\_{22}^{\frac{1}{2}}|^{-h} \mathrm{d}Y\_{2} \\ = \frac{|\Sigma\_{22}|^{\frac{m}{2}}}{\Gamma\_{q}(h)} \int\_{W>O} |W|^{h-\frac{q+1}{2}} |I+W|^{-\frac{m}{2}} |W+\Sigma\_{22}^{\frac{1}{2}} \Sigma\_{22}^{\frac{1}{2}}|^{-h} \mathrm{d}W, \ W = \Sigma\_{22}^{\frac{1}{2}} Y\_{2} \Sigma\_{22}^{\frac{1}{2}}, \\ (\mathrm{iii}) \end{split}$$

as *W* = *Σ* 1 2 <sup>22</sup>*Y*2*Σ* 1 2 <sup>22</sup> ⇒ d*Y*<sup>2</sup> = |*Σ*22| <sup>−</sup>*q*+<sup>1</sup> <sup>2</sup> <sup>d</sup>*W*. Now, letting *<sup>W</sup>* <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> <sup>−</sup> *<sup>I</sup>* , so that d*<sup>W</sup>* <sup>=</sup> |*U*| −*(q*+1*)* d*U* with *O<U<I* , the expression in *(iii)*, denoted by *δ*, becomes

$$\delta = \frac{|\Sigma\_{22}|^{\frac{m}{2}}}{\Gamma\_q(h)} \int\_{O < U < I} |U|^{\frac{m}{2} - \frac{q+1}{2}} |I - U|^{h - \frac{q+1}{2}} |I - AU|^{-h} \mathrm{d}U$$

where *A* = *I* − *Σ* 1 2 <sup>22</sup>*Σ*22*<sup>Σ</sup>* 1 2 <sup>22</sup>. Note that since |*Σ*| *m* <sup>2</sup> = |*Σ*22| *m* <sup>2</sup> <sup>|</sup>*Σ*<sup>11</sup> <sup>−</sup> *<sup>Σ</sup>*12*(Σ*22*)*−1*Σ*21<sup>|</sup> *m* <sup>2</sup> = |*Σ*22| *m* <sup>2</sup> <sup>|</sup>*Σ*11<sup>|</sup> <sup>−</sup> *<sup>m</sup>* <sup>2</sup> , |*Σ*| *m* <sup>2</sup> in the denominator of the constant part gets canceled out, the remaining constant expression being

$$\frac{\Gamma\_{p+q}(\frac{m}{2}+h)}{\Gamma\_{p+q}(\frac{m}{2})} \frac{\Gamma\_p(\frac{m}{2})}{\Gamma\_p(\frac{m}{2}+h)}.\tag{i\nu}$$

The integral part of *δ* can be evaluated by making use of Euler's representation of a Gauss' hypergeometric function of matrix argument, which as given in formula (5.2.15) of Mathai (1997), is

$$\frac{\Gamma\_q(a)\Gamma\_q(c-a)}{\Gamma\_q(c)} \,\_2F\_1(a,b;c;X) = \int\_{O$$

where *O<Z<I* and *O<X<I* are *q* × *q* real matrices. Thus, *δ* can be expressed as the follows:

$$\delta = \frac{\Gamma\_q(\frac{m}{2})\Gamma\_q(h)}{\Gamma\_q(\frac{m}{2} + h)\Gamma\_q(h)} \,\_2F\_1(\frac{m}{2}, h; \frac{m}{2} + h; \, I - \Sigma\_{22}^{\frac{1}{2}}\Sigma^{22}\Sigma\_{22}^{\frac{1}{2}}),$$

so that

$$E[u^h] = \frac{\Gamma\_{p+q}(\frac{m}{2} + h)\Gamma\_p(\frac{m}{2})}{\Gamma\_{p+q}(\frac{m}{2})\Gamma\_p(\frac{m}{2} + h)} \frac{\Gamma\_q(\frac{m}{2})}{\Gamma\_q(\frac{m}{2} + h)} \,\_2F\_1(\frac{m}{2}, h; \frac{m}{2} + h; I - \Sigma\_{22}^{\frac{1}{2}}\Sigma^{22}\Sigma\_{22}^{\frac{1}{2}}).\tag{10.5.3}$$

For *p* ≤ *q*, it follows from the definition of the matrix-variate gamma function that *Γp*+*<sup>q</sup> (α)* <sup>=</sup> *<sup>π</sup>*−*p q/*<sup>2</sup> *Γq (α)Γp(α* <sup>−</sup> *q/*2*)*. Thus, the constant part in (10.5.3) simplifies to

$$\frac{\Gamma\_p(\frac{m}{2} - \frac{q}{2} + h)}{\Gamma\_p(\frac{m}{2} - \frac{q}{2})} \frac{\Gamma\_p(\frac{m}{2})}{\Gamma\_p(\frac{m}{2} + h)}.$$

Then,

$$E[u^h] = \frac{\Gamma\_p(\frac{m-q}{2} + h)}{\Gamma\_p(\frac{m-q}{2})} \frac{\Gamma\_p(\frac{m}{2})}{\Gamma\_p(\frac{m}{2} + h)} \,\_2F\_1(\frac{m}{2}, h; \frac{m}{2} + h; I - \Sigma\_{22}^{\frac{1}{2}}\Sigma^{22}\Sigma\_{22}^{\frac{1}{2}}) \tag{10.5.4}$$

for *<sup>m</sup>* <sup>≥</sup> *<sup>p</sup>* <sup>+</sup> *q, (h) >* <sup>−</sup>*<sup>m</sup>* <sup>2</sup> <sup>+</sup> *<sup>p</sup>*−<sup>1</sup> <sup>2</sup> <sup>+</sup> *<sup>q</sup>* <sup>2</sup> *, p* ≤ *q*. Had *Y*<sup>2</sup> been integrated out first instead of *Y*1, we would have ended up with a hypergeometric function having *I* − *Σ* 1 2 <sup>11</sup>*Σ*11*<sup>Σ</sup>* 1 2 <sup>11</sup> as its argument, that is,

$$E[\boldsymbol{u}^{h}] = \frac{\Gamma\_p(\frac{m-q}{2} + h)}{\Gamma\_p(\frac{m-q}{2})} \frac{\Gamma\_p(\frac{m}{2})}{\Gamma\_p(\frac{m}{2} + h)} \,\_2F\_1(\frac{m}{2}, h; \frac{m}{2} + h; I - \Sigma\_{11}^{\frac{1}{2}}\Sigma^{11}\Sigma\_{11}^{\frac{1}{2}}).\tag{10.5.5}$$

#### **10.5.1. The sampling distribution of the multiple correlation coefficient**

When *<sup>p</sup>* <sup>=</sup> 1 and *q >* 1, *<sup>r</sup>*<sup>2</sup> <sup>1</sup>*(*1*...q)* is equal to the square of the sample multiple correlation coefficient *r*1*(*1*...q)*. In this case, the argument in (10.5.5) is a real scalar quantity that is equal to 1 <sup>−</sup> *<sup>σ</sup>*11*σ*11, the real matrix-variate *Γp(*·*)* functions are simply *Γ (*·*)* functions and *<sup>u</sup>* <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>r</sup>*<sup>2</sup> <sup>1</sup>*(*1*...q)*. Letting *<sup>y</sup>* <sup>=</sup> *<sup>r</sup>*<sup>2</sup> <sup>1</sup>*(*1*...q)*, *E*[1 − *y*] *<sup>h</sup>* is available from (10.5.5) for *<sup>p</sup>* <sup>=</sup> <sup>1</sup> and the argument of the <sup>2</sup>*F*<sup>1</sup> hypergeometric function is then

$$1 - \sigma\_{11}\sigma^{11} = 1 - \sigma\_{11}(\sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21})^{-1} = -\frac{\Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}}{\sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}}.\tag{10.5.6}$$

By taking the inverse Mellin transform of (10.5.5) for *h* = *s*−1 and *p* = 1, we can express the density *f (y)* of the square of the sample multiple correlation as follows:

$$f(\mathbf{y}) = \frac{(1 - \rho^2)^{\frac{m}{2}} \Gamma(\frac{m}{2})}{\Gamma(\frac{m - q}{2}) \Gamma(\frac{q}{2})} \mathbf{y}^{\frac{q}{2} - 1} (1 - \mathbf{y})^{\frac{m - q}{2} - 1} \,\_2F\_1(\frac{m}{2}, \frac{m}{2}; \frac{q}{2}; \rho^2 \mathbf{y}) \tag{10.5.7}$$

where *<sup>ρ</sup>*<sup>2</sup> is the population multiple correlation squared, that is, *<sup>ρ</sup>*<sup>2</sup> <sup>=</sup> [*Σ*12*Σ*−<sup>1</sup> <sup>22</sup> *Σ*21]*/σ*11. We can verify the result by computing the *h*-th moment of 1 − *y*

in (10.5.7). The *h*-th moment can be determined as follows by expanding the <sup>2</sup>*F*<sup>1</sup> function and then integrating:

$$\begin{aligned} E[1-\mathbf{y}]^h &= \frac{(1-\rho^2)^{\frac{m}{2}}\Gamma(\frac{m}{2})}{\Gamma(\frac{m-q}{2})\Gamma(\frac{q}{2})} \sum\_{k=0}^{\infty} \frac{(\frac{m}{2})\_k(\frac{m}{2})\_k}{(\frac{q}{2})\_k} \frac{(\rho^2)^k}{k!} \\ &\times \int\_0^1 \mathbf{y}^{\frac{q}{2}+k-1} (1-\mathbf{y})^{\frac{m-q}{2}+h-1} \mathbf{d} \mathbf{y}, \end{aligned}$$

the integral part being

$$\frac{\Gamma(\frac{q}{2}+k)\Gamma(\frac{m-q}{2}+h)}{\Gamma(\frac{m}{2}+h+k)} = \frac{\Gamma(\frac{q}{2})\Gamma(\frac{m-q}{2}+h)}{\Gamma(\frac{m}{2}+h)} \frac{(\frac{q}{2})\_k}{(\frac{m}{2}+h)\_k},$$

so that

$$E[1-\mathbf{y}]^h = (1-\rho^2)^{\frac{m}{2}} \frac{\Gamma(\frac{m-q}{2}+h)}{\Gamma(\frac{m-q}{2})} \frac{\Gamma(\frac{m}{2})}{\Gamma(\frac{m}{2}+h)} \,\_2F\_1(\frac{m}{2}, \frac{m}{2}; \frac{m}{2}+h; \rho^2). \tag{10.5.8}$$

On applying the relationship,

$$\,\_2F\_1(a,b;c;z) = (1-z)^{-b} \,\_2F\_1\left(c-a,b;c; \frac{z}{z-1}\right),\tag{10.5.9}$$

we have

$${}\_{2}F\_{1}(\frac{m}{2},\frac{m}{2};\frac{m}{2}+h;\rho^{2})=(1-\rho^{2})^{-\frac{m}{2}}{}\_{2}F\_{1}(h,\frac{m}{2};\frac{m}{2}+h;\frac{\rho^{2}}{\rho^{2}-1}),$$

with

$$\frac{\rho^2}{\rho^2 - 1} = -\frac{\Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}}{\sigma\_{11} - \Sigma\_{12}\Sigma\_{22}^{-1}\Sigma\_{21}},$$

which agrees with (10.5.6). Observe that *(*<sup>1</sup> <sup>−</sup> *<sup>ρ</sup>*2*) m* <sup>2</sup> gets canceled out so that (10.5.8) agrees with (10.5.5) for *p* = 1.

We can also obtain a representation of the density of the sample canonical correlation matrix whose M-transform is as given in (10.5.5) for *p* ≤ *q*. This can be achieved by duplicating the steps utilized for the particular case considered in this section, which yields the following density:

Canonical Correlation Analysis

$$f(R) = \frac{|I - P|^{\frac{m}{2}} \Gamma\_p(\frac{m}{2})}{\Gamma\_p(\frac{m - q}{2}) \Gamma\_p(\frac{q}{2})} |R|^{\frac{q}{2} - \frac{p + 1}{2}} |I - R|^{\frac{m - q}{2} - \frac{p + 1}{2}}$$

$$\times {}\_2F\_1(\frac{m}{2}, \frac{m}{2}; \frac{q}{2}; P^{\frac{1}{2}} R P^{\frac{1}{2}}) \tag{10.5.10}$$

where *<sup>P</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> 2 <sup>11</sup> is the population canonical correlation matrix. Note that a function giving rise to a certain M-transform need not be unique. However, by making use of the Laplace transform and its inverse in the real matrix-variate case, Mathai (1981) has shown that the function specified in (10.5.10) is actually the unique density of *R*.

#### **Exercises 10**

**10.1.** In Example 10.3.2, verify that

$$\frac{[\alpha'\_{(2)}\,\Sigma\_{12}\beta\_{(2)}\mathbf{l}^2}{\nu\_2\delta\_2} = \lambda\_2$$

where *λ*<sup>2</sup> is the second largest eigenvalue of the canonical correlation matrix *A*.

**10.2.** In Example 10.3.2, use equation (10.1.1) or equation *(ii)* preceding it with *ρ*<sup>1</sup> = *ρ*<sup>2</sup> = *ρ* and evaluate *β(*1*)* and *β(*2*)* from *α(*1*)* and *α(*2*)*. Obtain *β* first, normalize it subject to the constraint *β*- *Σ*22*β* = 1 and then obtain *β(*1*)* and *β(*2*)*. Then verify the results

$$\frac{[\alpha'\_{(1)}\Sigma\_{12}\beta\_{(1)}]^2}{\gamma\_1\delta\_1} = \lambda\_1 \text{ and } \frac{[\alpha'\_{(2)}\Sigma\_{12}\beta\_{(2)}]^2}{\gamma\_2\delta\_2} = \lambda\_2$$

where *λ*<sup>1</sup> and *λ*<sup>2</sup> are the largest and second largest eigenvalues of the canonical correlation matrix *A*.

**10.3.** Let

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \ Y = \begin{bmatrix} y\_1 \\ y\_2 \end{bmatrix}, \ \text{Cov}(X) = \Sigma\_{11} = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 3 & 0 \\ 0 & 0 & 4 \end{bmatrix},$$

$$\text{Cov}(Y) = \Sigma\_{22} = \begin{bmatrix} 2 & -1 \\ -1 & 3 \end{bmatrix}, \ \text{Cov}(X, Y) = \begin{bmatrix} 1 & 0 \\ -1 & 1 \\ 1 & 0 \end{bmatrix},$$

where *x*1*, x*2*, x*3*, y*1*, y*<sup>2</sup> are real scalar random variables. Evaluate the following where the notations of this chapter are utilized: (1): The canonical correlations *ρ(*1*)* and *ρ(*2*)*; (2): The first pair of canonical variables *(u*1*, v*1*)* by direct evaluation as done in Example 10.3.2; (3): Verify that

$$\frac{\left[\beta\_{\text{(l)}}' \Sigma\_{21} \alpha\_{\text{(l)}}\right]^2}{\gamma\_1 \delta\_{\text{l}}} = \lambda\_{\text{l}} \; : \text{ the largest eigenvalue of } B\_{\text{l}}$$

where *<sup>B</sup>* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> 2 <sup>22</sup> *<sup>Σ</sup>*21*Σ*−<sup>1</sup> <sup>11</sup> *<sup>Σ</sup>*12*Σ*−<sup>1</sup> 2 <sup>22</sup> ; (4): Evaluate the second pair of canonical variables *(u*2*, v*2*)* by using equation (10.1.1) for constructing *α(*1*)* and *α(*2*)* after obtaining *β(*1*)* and *β(*2*)*; (5): Verify that

$$\frac{[\beta'\_{(2)}\Sigma\_{21}\alpha\_{(2)}]^2}{\chi\_2\delta\_2} = \lambda\_2 \text{ : the second largest eigenvalue of } \mathcal{B}.$$

**10.4.** Repeat Problem 10.3 with *X, Y* and their associated covariance matrices defined as follows:

$$\begin{aligned} X &= \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \ Y = \begin{bmatrix} y\_1 \\ y\_2 \\ y\_3 \end{bmatrix}, \ \text{Cov}(X) = \Sigma\_{11} = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 2 & 2 \\ 0 & 2 & 3 \end{bmatrix}, \\ \text{Cov}(Y) &= \Sigma\_{22} = \begin{bmatrix} 2 & 2 & 0 \\ 2 & 3 & 0 \\ 0 & 0 & 2 \end{bmatrix}, \ \text{Cov}(X, Y) = \Sigma\_{12} = \begin{bmatrix} 1 & 1 & 1 \\ 1 & -1 & 1 \\ -1 & 1 & -1 \end{bmatrix} \end{aligned}$$

where *x*1*, x*2*, x*3*, y*1*, y*2*, y*<sup>3</sup> are real scalar random variables. As well, compute the three pairs of canonical variables.

**10.5.** Show that the M-transform in (10.5.5) is available from the density specified in (10.5.10).

#### **References**

A.M. Mathai (1981): Distribution of the canonical correlation matrix, *Annals of the Institute of Statistical Mathematics*, **33**(A), 35–43.

A.M. Mathai (1997): *Jacobians of Matrix Transformations and Functions of Matrix Argument*, World Scientific Publishing, New York.

A.M. Mathai and H.J. Haubold (2017a): *Linear Algebra for Physicists and Engineers*, De Gruyter, Germany.

A.M. Mathai and H.J. Haubold (2017b): *Probability and Statistics for Physicists and Engineers*, De Gruyter, Germany.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 11 Factor Analysis**

### **11.1. Introduction**

We will utilize the same notations as in the previous chapters. Lower-case letters *x, y,...* will denote real scalar variables, whether mathematical or random. Capital letters *X, Y, . . .* will be used to denote real matrix-variate mathematical or random variables, whether square or rectangular matrices are involved. A tilde will be placed on top of letters such as *x,*˜ *y,*˜ *X,* ˜ *Y*˜ to denote variables in the complex domain. Constant matrices will for instance be denoted by *A, B, C*. A tilde will not be used on constant matrices unless the point is to be stressed that the matrix is in the complex domain. In the real and complex cases, the determinant of a square matrix *A* will be denoted by |*A*| or det*(A)* and, in the complex case, the absolute value or modulus of the determinant of *A* will be denoted as |det*(A)*|. When matrices are square, their order will be taken as *p* × *p*, unless specified otherwise. When *A* is a full rank matrix in the complex domain, then *AA*∗ is Hermitian positive definite where an asterisk designates the complex conjugate transpose of a matrix. Additionally, d*X* will indicate the wedge product of all the distinct differentials of the elements of the matrix *X*. Thus, letting the *p* × *q* matrix *X* = *(xij )* where the *xij* 's are distinct real scalar variables, d*<sup>X</sup>* = ∧*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> <sup>∧</sup>*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>d</sup>*xij* . For the complex matrix *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup> *iX*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, where *<sup>X</sup>*<sup>1</sup> and *<sup>X</sup>*<sup>2</sup> are real, d*X*˜ <sup>=</sup> <sup>d</sup>*X*<sup>1</sup> <sup>∧</sup> <sup>d</sup>*X*2.

Factor analysis is a statistical method aiming to identify a relatively small number of underlying (unobserved) factors that could explain certain interdependencies among a larger set of observed variables. Factor analysis also proves useful for analyzing causal mechanisms. As a statistical technique, Factor Analysis was originally developed in connection with psychometrics. It has since been utilized in operations research, finance and biology, among other disciplines. For instance, a score available on an intelligence test will often assess several intellectual faculties and cognitive abilities. It is assumed that a certain linear function of the contributions from these various mental factors is producing the final score. Hence, there is a parallel to be made with analysis of variance as well as design of experiments and linear regression models.

#### **11.2. Linear Models from Different Disciplines**

In order to introduce the current topic, we will first examine a linear regression model and an experimental design model.

#### **11.2.1. A linear regression model**

Let *x* be a real scalar random variable and let *t*1*,...,tr* be either *r* real fixed numbers or given values of *r* real random variables. Let the conditional expectation of *x*, given *t*1*,...,tr,* be of the form

$$E[\mathbf{x}|t\_1, \dots, t\_r] = a\_o + a\_1t\_1 + \dots + a\_rt\_r$$

or the corresponding model be

$$x = a\_o + a\_{\rm l}t\_{\rm l} + \cdots + a\_{\rm r}t\_{\rm r} + e$$

where *ao, a*1*,...,ar* are unknown constants, *t*1*,...,tr* are given values and *e* is the error component or the sum total of contributions coming from unknown or uncontrolled factors plus the experimental error. For example, *x* might be an inflation index with respect to a particular base year, say 2010. In this instance, *t*<sup>1</sup> may be the change or deviation in the average price per kilogram of certain staple vegetables from the base year 2010, *t*<sup>2</sup> may be the change or deviation in the average price of a kilogram of rice compared to the base year, *t*<sup>3</sup> may be the change or deviation in the average price of flour per kilogram with respect to the base year, and so on, and *tr* may be the change or deviation in the average price of milk per liter compared to the base year 2010. The notation *tj , j* = 1*, . . . , r,* is utilized to designate the given values as well as the corresponding random variables. Since we are taking deviations from the base values, we may assume without any loss of generality that the expected value of *tj* is zero, that is, *E*[*tj* ] = 0*, j* = 1*,...,r*. We may also take the expected value of the error term *e* to be zero, that is, *E*[*e*] = 0. Now, let *x*<sup>1</sup> be the inflation index, *x*<sup>2</sup> be the caloric intake index per person, *x*<sup>3</sup> be the general health index and so on. In all these cases, the same *t*1*,...,tr* can act as the independent variables in a regression set up. Thus, in such a situation, a multivariate linear regression model will have the following format:

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ \vdots \\ x\_p \end{bmatrix} = \begin{bmatrix} \mu\_1 \\ \mu\_2 \\ \vdots \\ \mu\_p \end{bmatrix} + \begin{bmatrix} a\_{11} & a\_{12} & \dots & a\_{1r} \\ a\_{21} & a\_{22} & \dots & a\_{2r} \\ \vdots & \vdots & \ddots & \vdots \\ a\_{p1} & a\_{p2} & \dots & a\_{pr} \end{bmatrix} \begin{bmatrix} f\_1 \\ f\_2 \\ \vdots \\ f\_r \end{bmatrix} + \begin{bmatrix} e\_1 \\ e\_2 \\ \vdots \\ e\_p \end{bmatrix},\tag{11.2.1}$$

Factor Analysis

and we may write this model as

$$X = M + AF + \epsilon$$

where *Λ* = *(λij )* is a *p* × *r*, *r* ≤ *p,* matrix of full rank *r*, is *p* × 1 and *F* is *r* × 1. In (11.2.1), *λij* = *aij* and *fj* = *tj* . Then, *E*[*X*] = *M* + *Λ E*[*F*] + *E*[] = *M* since we have assumed that *E*[*F*] = *O* (a null matrix) and *E*[] = *O*. When *F* and are uncorrelated, the covariance matrix associated with *X*, denoted by Cov*(X)* = *Σ,* is the following:

$$\begin{split} \Sigma &= \text{Cov}(X) = E\{ (X - M)(X - M)' \} = E\{ (AF + \epsilon)(AF + \epsilon)' \} \\ &= \Lambda \, \text{Cov}(F) \, \Lambda' + \text{Cov}(\epsilon) + O \\ &= \Lambda \, \Phi \, \Lambda' + \Psi \end{split} \tag{11.2.2}$$

where the covariance matrices of *F* and are respectively denoted by *Φ>O* (positive definite) and *Ψ >O*. In the above formulation, *F* is taken to be a real vector random variable. In a simple linear model, the covariance matrix of , namely *Ψ*, is usually taken as *σ*2*I* where *σ*<sup>2</sup> *>* 0 is a real scalar quantity and *I* is the identity matrix. In a more general setting, *Ψ* can be taken to be a diagonal matrix whose diagonal elements are positive; in such a model, the *ej* 's are uncorrelated and their variances need not be equal. It will be assumed that the covariance matrix *Ψ* in (11.2.2) is a diagonal matrix having positive diagonal elements.

#### **11.2.2. A design of experiment model**

Consider a completely randomized experiment where one set of treatments are undertaken. In this instance, the experimental plots are assumed to be fully homogeneous with respect to all the known factors of variation that may affect the response. For example, the observed value may be the yield of a particular variety of corn grown in an experimental plot. Let the set of treatments be *r* different fertilizers *F*1*,...,Fr*, the effects of these fertilizers being denoted by *α*1*,...,αr*. If no fertilizer is applied, the yield from a test plot need not be zero. Let *μ*<sup>1</sup> be a general effect when *F*<sup>1</sup> is applied so that we may regard *α*<sup>1</sup> as a deviation from this effect *μ*<sup>1</sup> due to *F*1. Let *e*<sup>1</sup> be the sum total of the contributions coming from all unknown or uncontrolled factors plus the experimental error, if any, when *F*<sup>1</sup> is applied. Then a simple linear one-way classification model for *F*<sup>1</sup> is

$$
\alpha\_{\mathbb{I}} = \mu\_{\mathbb{I}} + \alpha\_{\mathbb{I}} + e\_{\mathbb{I}},
$$

with *x*<sup>1</sup> representing the yield from the test plot where *F*<sup>1</sup> was applied. Then, corresponding to *F*1*,...,Fr, r* = *p* we have the following system:

$$\begin{aligned} \mathbf{x}\_1 &= \mu\_1 + \mathbf{a}\_1 + e\_1 \\ &\vdots \\ \mathbf{x}\_p &= \mu\_p + \mathbf{a}\_p + e\_p \end{aligned}$$

$$X = M + AF + \epsilon \tag{11.2.3}$$

where

or, in matrix notation,

$$X = \begin{bmatrix} x\_1 \\ \vdots \\ x\_p \end{bmatrix}, \; \epsilon = \begin{bmatrix} e\_1 \\ \vdots \\ e\_p \end{bmatrix}, \; F = \begin{bmatrix} \alpha\_1 \\ \vdots \\ \alpha\_p \end{bmatrix}, \text{ and } A = \begin{vmatrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 \end{vmatrix}.$$

In this case, the elements of *Λ* are dictated by the design itself. If the vector *F* is fixed, we call the model specified by (11.2.3), the fixed effect model, whereas if *F* is assumed to be random, then it is referred to as the random effect model. With a single observation per cell, as stated in (11.2.3), we will not be able to estimate the parameters or test hypotheses. Thus, the experiment will have to be replicated. So, let the *j* -th replicated observation vector be

$$X\_j = \begin{bmatrix} x\_{1j} \\ \vdots \\ x\_{pj} \end{bmatrix}, \ j = 1, \ldots, n,$$

*Σ, Φ,* and *Ψ* remaining the same for each replicate within the random effect model. Similarly, for the regression model given in (11.2.1), the *j* -th replication or repetition vector will be *X*- *<sup>j</sup>* = *(x*1*<sup>j</sup> ,...,xpj )* with *Σ, Φ* and *Ψ* therein remaining the same for each sample.

We will consider a general linear model encompassing those specified in (11.2.1) and (11.2.3) and carry out a complete analysis that will involve verifying the existence and uniqueness of such a model, estimating its parameters and testing various types of hypotheses. The resulting technique is referred to as Factor Analysis.

#### **11.3. A General Linear Model for Factor Analysis**

Consider the following general linear model:

$$X = M + \Lambda F + \epsilon \tag{11.3.1}$$

#### Factor Analysis

where

$$\begin{aligned} X &= \begin{bmatrix} x\_1 \\ \vdots \\ x\_p \end{bmatrix}, \ M = \begin{bmatrix} \mu\_1 \\ \vdots \\ \mu\_p \end{bmatrix}, \ \epsilon = \begin{bmatrix} e\_1 \\ \vdots \\ e\_p \end{bmatrix}, \\\\ A &= \begin{bmatrix} \lambda\_{11} & \lambda\_{12} & \dots & \lambda\_{1r} \\ \lambda\_{21} & \lambda\_{22} & \dots & \lambda\_{2r} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda\_{p1} & \lambda\_{p2} & \dots & \lambda\_{pr} \end{bmatrix} \text{ and } F = \begin{bmatrix} f\_1 \\ \vdots \\ f\_r \end{bmatrix}, \ r \le p, \end{aligned}$$

with the *μj* 's, *λij* 's, *fj* 's being real scalar parameters, the *xj* 's, *j* = 1*,... , p,* being real scalar quantities, and *Λ* being of dimension *p* × *r, r* ≤ *p,* and of full rank *r*. When considering expected values, variances, covariances, etc., *X, F ,* and will be assumed to be random quantities; however, when dealing with estimates, *X* will represent a vector of observations. This convention will be employed throughout this chapter so as to avoid a multiplicity of symbols for the variables and the corresponding observations.

From a geometrical perspective, the *r* columns of *Λ*, which are linarly independent, span an *r*-dimensional subspace in the *p*-dimensional Euclidean space. In this case, the *r* × 1 vector *F* is a point in this *r*-subspace and this subspace is usually called the *factor space*. Then, right-multiplying the *p*×*r* matrix *Λ* by a matrix will correspond to employing a new set of coordinate axes for the factor space.

Factor Analysis is a subject dealing with the identification or unique determination of a model of the type specified in (11.3.1), as well as the estimation of its parameters and the testing of various related hypotheses. The subject matter was originally developed in connection with intelligence testing. Suppose that a test is administered to an individual to evaluate his/her mathematical skills, spatial perception, language abilities, etc., and that the score obtained is recorded. There will be a component in the model representing the expected score. If the test is administered to 10th graders belonging to a particular school, the grand average of such test scores among all 10th graders across the nation could be taken as the expected score. Then, inputs associated to various intellectual faculties or combinations thereof will come about. All such factors may be contributing towards the observed test score. If *f*1*,...,fr,* are the contributions coming from *r* factors corresponding to specific intellectual abilities, then, when a linear model is assumed, a certain linear combination of these inputs will constitute the final quantity accounting for the observed test score. A test score, *x*1, may then result from a linear model of the following form:

$$x\_1 = \mu\_1 + \lambda\_{11}f\_1 + \lambda\_{12}f\_2 + \dots + \lambda\_{1r}f\_r + e\_1$$

with *λ*11*,...,λ*1*<sup>r</sup>* being the coefficients of *f*1*,...,fr*, where *f*1*,...,fr*, are contributions from *r* factors toward *x*1; these factors may be called the *main intellectual factors* in this case, and the coefficients *λ*11*,...,λ*1*<sup>r</sup>* may be referred to as the *factor loadings* for these main factors. In this context, *μ*<sup>1</sup> is the general expected value and *e*<sup>1</sup> is the error component or the sum total of contributions originating from all unknown factors plus the experimental error, if any. Note that the contributions *f*1*,...,fr* due to the main intellectual factors can vary from individual to individual, and hence it is appropriate to treat *f*1*,...,fr* as random variables rather than fixed unknown quantities. These *f*1*,...,fr* are not observable as in the case of the design model in (11.2.3), whereas in the regression type model specified by (11.2.1), they may take on the recorded values of the observable variables which are called the independent variables. Thus, the model displayed in (11.3.1) may be analyzed either by treating *f*1*,...,fr* as fixed quantities or as random variables. If they are treated as random variables, we can assume that *f*1*,...,fr* follow some joint distribution. Usually, joint normality is presupposed for *f*1*,...,fr*. Since *f*1*,...,fr* are deviations from the general effect *μ*<sup>1</sup> due to certain main intellectual faculties under consideration, it may be assumed that the expected value is a null vector, that is, *E*[*F*] = *O*. We will denote the covariance matrix associated with *F* as *Φ*: Cov*(F )* = *Φ>O* (real positive definite). Note that the model's error term *ej* is always a random variable. Letting *x*1*,...,xp* be the test scores on *p* individuals, we have the error vector - = *(e*1*,...,ep)*. Without any loss of generality, we may take the expected value of as being a null vector, that is, *E*[] = *O*. For a very simple situation, we may assume the covariance matrix associated with to be Cov*()* <sup>=</sup> *<sup>σ</sup>*2*<sup>I</sup>* where *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>gt;* 0 is a real positive scalar quantity and *<sup>I</sup>* is the identity matrix. For a somewhat more general situation, we may take Cov*()* = *Ψ* where *Ψ* is a real positive definite diagonal matrix, or a diagonal matrix with positive diagonal elements. In the most general case, we may take *Ψ* to be a real positive definite matrix. It will be assumed that *Ψ* is diagonal with positive diagonal elements in the model (11.3.1), and that *F* and are uncorrelated. Thus, letting *Σ* be the covariance matrix of *X*, we have

$$\begin{split} \Sigma &= E[(X - M)(X - M)'] = E[(AF + \epsilon)(AF + \epsilon)'] \\ &= \Lambda E(FF')\Lambda' + E(\epsilon \epsilon') + O \\ &= \Lambda \Phi \Lambda' + \Psi \end{split} \tag{11.3.2}$$

where *Σ*, *Φ* and *Ψ*, with *Σ* = *ΛΦΛ*- + *Ψ*, are all assumed to be real positive definite matrices.

Factor Analysis

#### **11.3.1. The identification problem**

Is the model specified by (11.3.1) unique or could it represent different situations? In other words, does it make sense as a model as stated? Given any *r* × *r* nonsingular matrix *<sup>A</sup>*, let *AF* <sup>=</sup> *<sup>F</sup>*<sup>∗</sup> and *ΛA*−<sup>1</sup> <sup>=</sup> *<sup>Λ</sup>*∗*.* Then, *<sup>Λ</sup>*∗*F*<sup>∗</sup> <sup>=</sup> *ΛA*−1*AF* <sup>=</sup> *ΛF*. In other words,

$$X = M + \Lambda F + \epsilon = M + \Lambda^\* F^\* + \epsilon. \tag{11.3.3}$$

Consequently, the model in (11.3.1) is not identified, that is, it is not uniquely determined.

The identification problem can also be stated as follows: Does there exist a real positive definite *p* × *p* matrix *Σ>O* containing *p(p* + 1*)/*2 distinct elements, which can be uniquely represented as *ΛΦΛ*- + *Ψ* where *Λ* has *p r* distinct elements, *Φ>O* has *r(r* +1*)/*2 distinct elements and *Ψ* is a diagonal matrix having *p* distinct elements? There is clearly no such matrices as can be inferred from (11.3.3). Note that an *r* × *r* arbitrary matrix *A* represents *r*<sup>2</sup> distinct elements. It can be observed from (11.3.3) that we can impose *r*<sup>2</sup> conditions on the parameters in *Λ, Φ* and *Ψ*. The question could also be posed as follows: Can the *p(p* <sup>+</sup> <sup>1</sup>*)/*2 distinct elements in *<sup>Σ</sup>* plus the *<sup>r</sup>*<sup>2</sup> elements in *<sup>A</sup>* (*r*<sup>2</sup> conditions) uniquely determine all the elements of *Λ, Ψ* and *Φ*? Let us determine how many elements there are in total. *Λ, Ψ* and *Φ* have a total of *pr* +*p*+*r(r* +1*)/*2 elements while *<sup>A</sup>* and *<sup>Σ</sup>* have a total of *<sup>r</sup>*<sup>2</sup> <sup>+</sup> *p(p* <sup>+</sup> <sup>1</sup>*)/*2 elements. Hence, the difference, denoted by *δ*, is

$$\delta = \frac{p(p+1)}{2} + r^2 - \left[pr + \frac{r(r+1)}{2} + p\right] = \frac{1}{2}[(p-r)^2 - (p+r)].\tag{11.3.4}$$

Observe that the right-hand side of (11.3.2) is not a linear function of *Λ, Φ* and *Ψ*. Thus, if *δ >* 0, we can anticipate that existence and uniqueness will hold although these properties cannot be guaranteed, whereas if *δ <* 0, then existence can be expected but uniqueness may be in question. Given (11.3.2), note that

$$
\Sigma = \Psi + \Lambda \Phi \Lambda' \Rightarrow \Sigma - \Psi = \Lambda \Phi \Lambda'
$$

where *ΛΦΛ* is positive semi-definite of rank *r*, since the *p* × *r, r* ≤ *p,* matrix *Λ* has full rank *r* and *Φ>O* (positive definite). Then, the existence question can also be stated as follows: Given a *p* ×*p* real positive definite matrix *Σ>O*, can we find a diagonal matrix *Ψ* with positive diagonal elements such that *Σ* − *Ψ* is a real positive semi-definite matrix of a specified rank *r*, which is expressible in the form *BB* for some *p* × *r* matrix *B* of rank *r* where *r* ≤ *p*? For the most part, the available results on this question of existence can be found in Anderson (2003) and Anderson and Rubin (1956). *If a set of parameters exist and if the model is uniquely determined, then we say that the model is identified, or* *alternatively, identifiable*. The concept of identification or identifiability within the context of Factor Analysis has been studied by Ihara and Kano (1995), Wegge (1996), Allman et al. (2009) and Chen et al. (2020), among others.

Assuming that *<sup>Φ</sup>* <sup>=</sup> *<sup>I</sup>* will impose *r(r* <sup>+</sup> <sup>1</sup>*)/*2 conditions. However, *<sup>r</sup>*<sup>2</sup> <sup>=</sup> *r(r*+1*)* 2 + *r(r*−1*)* <sup>2</sup> . Thus, we may impose *r(r* − 1*)/*2 additional conditions after requiring that *Φ* = *I* . Observe that when *Φ* = *I* , *Λ*∗*ΦΛ*-<sup>∗</sup> = *Λ*∗*Λ*-<sup>∗</sup> <sup>=</sup> *ΛA*−1*A*-−1 *Λ*- , and if this is equal to *ΛΛ* assuming that *Φ* = *I* , this means that *(A*- *A)*−<sup>1</sup> <sup>=</sup> *<sup>I</sup>* or *<sup>A</sup>*- *A* = *I* , that is, *A* is an orthonormal matrix. Thus, under the condition *Φ* = *I* , the arbitrary *r* × *r* matrix *A* becomes an orthonormal matrix. In this case, the transformation *Y* = *ΛA* is an orthonormal transformation or a rotation of the coordinate axes. The following *r* × *r* symmetric matrix of *r(r* + 1*)/*2 distinct elements

$$
\Delta = \Lambda^{\prime} \Psi^{-1} \Lambda \tag{11.3.5}
$$

is needed for solving estimation and hypothesis testing problems; accordingly, we can impose *r(r* − 1*)/*2 conditions by requiring *Δ* to be diagonal with distinct diagonal elements, that is, *<sup>Δ</sup>* <sup>=</sup> diag*(η*1*,...,ηr), ηj <sup>&</sup>gt;* <sup>0</sup>*, j* <sup>=</sup> <sup>1</sup>*,...,r*. This imposes *r(r*+1*)* <sup>2</sup> <sup>−</sup> *<sup>r</sup>* <sup>=</sup> *r(r*−1*)* 2 conditions. Thus, for the model to be identifiable or for all the parameters in *Λ, Φ, Ψ* to be uniquely determined, we can impose the condition *Φ* = *I* and require that *Δ* = *Λ*- *Ψ* <sup>−</sup>1*Λ* be diagonal with positive diagonal elements. These two conditions will provide *r(r*+1*)* <sup>2</sup> <sup>+</sup> *r(r*−1*)* <sup>2</sup> <sup>=</sup> *<sup>r</sup>*<sup>2</sup> restrictions on the model which will then be identified.

When *Φ* = *I* , the main factors are orthogonal. If *Φ* is a diagonal matrix (including the identity matrix), the covariances are zeros and it is an orthogonal situation, in which case we say that the *main factors are orthogonal*. If *Φ* is not diagonal, then we say that the *main factors are oblique*.

One can also impose *r(r* − 1*)/*2 conditions on the *p* × *r* matrix *Λ*. Consider the first *r* × *r* block, that is, the leading *r* × *r* submatrix or the upper *r* × *r* block in the *p* × *r* matrix, which we will denote by *B*. Imposing the condition that this *r* × *r* block *B* is lower triangular will result in *<sup>r</sup>*<sup>2</sup> <sup>−</sup> *r(r*+1*)* <sup>2</sup> <sup>=</sup> *r(r*−1*)* <sup>2</sup> conditions. Hence, *Φ* = *I* and the condition that this leading *<sup>r</sup>* <sup>×</sup> *<sup>r</sup>* block *<sup>B</sup>* is lower triangular will guarantee *<sup>r</sup>*<sup>2</sup> restrictions, and the model will then be identified. One can also take a preselected *r* × *r* matrix *B*<sup>1</sup> and then impose the condition that *B*1*B* be lower triangular. This will, as well, produce *r(r*−1*)* 2 conditions. Thus, *Φ* = *I* and *B*1*B* being lower triangular will ensure the identification of the model.

When we impose conditions on *Φ* and *Ψ*, the unknown covariance matrices must assume certain formats. Such conditions can be justified. However, could conditions be put on *Λ*, the factor loadings? Letting the first *r* × *r* block *B* in the *p* × *r* matrix *Λ* be lower triangular is tantamount to assuming that *λ*<sup>12</sup> = 0 = *λ*<sup>13</sup> =···= *λ*1*<sup>r</sup>* or, equivalently, that

#### Factor Analysis

*f*2*,...,fr* do not contribute to the model for determining *x*<sup>1</sup> in *X*- = *(x*1*, x*2*,...,xp)*. Such restrictions are justified if we can design the experiment in such a way that *x*<sup>1</sup> depends on *f*<sup>1</sup> alone and not on *f*2*,...,fr*. In psychological tests, it is possible to design tests in such a way that only certain main factors affect the scores. Thus, in such instances, we are justified to utilize a triangular format such that, in general, there are no contributions from *fi*+<sup>1</sup>*,...,fr,* toward *xi* or, equivalently, the factor loadings *λi i*+<sup>1</sup>*,...,λir* equal zero for *i* = 1*,...,r* − 1. For example, suppose that the first *r* tests are designed in such a way that only *f*1*,...,fi* and no other factors contribute to *xi* or, equivalently, *xi* = *μi* + *λi*1*f*<sup>1</sup> +···+ *λiifi* + *ei, i* = 1*,...,r*. We can also measure the contribution from *fi* in *λii* units or we can take *λii* <sup>=</sup> 1. By taking *<sup>B</sup>* <sup>=</sup> *Ir,* we can impose *<sup>r</sup>*<sup>2</sup> conditions without requiring that *Φ* = *I* . This means that the first *r* tests are specifically designed so that *x*<sup>1</sup> only has a one unit contribution from *f*1, *x*<sup>2</sup> only has a one unit contribution from *f*2, and so on, *xr* receiving a one unit contribution from *fr.* When *B* is taken to be diagonal, the factor loadings are *λ*11*, λ*22*,..., λrr*, respectively, so that only *fi* contributes to *xi* for *i* = 1*,...,r*. Accordingly, the following are certain model identification conditions:

(1): *Φ* = *I* and *Λ*- *Ψ* <sup>−</sup>1*Λ* is diagonal with distinct diagonal elements;

(2): *Φ* = *I* and the leading *r* × *r* submatrix *B* in the *p* × *r* matrix *Λ* is triangular;

(3): *Φ* = *I* and *B*1*B* is lower triangular where *B*<sup>1</sup> is a preselected matrix;

(4): The leading *r* × *r* submatrix *B* in the *p* × *r* matrix *Λ* is an identity matrix.

Observe that when *r* = *p*, condition (4) corresponds to the design model considered in (11.2.3).

#### **11.3.2. Scaling or units of measurement**

A shortcoming of any analysis being based on a covariance matrix *Σ* is that the covariances depend on the units of measurement of the individual variables. Thus, modifying the units will affect the covariances. If we let *yi* and *yj* be two real scalar random variables with variances *σii* and *σjj* and associated covariance *σij* , the effect of scaling or changes in the measurement units may be eliminated by considering the variables *zi* = *yi/* <sup>√</sup>*σii* and *zj* <sup>=</sup> *yj /* <sup>√</sup>*σjj* whose covariance Cov*(zi, zj )* <sup>≡</sup> *rij* is actually the correlation between *yi* and *yj* , which is free of the units of measurement. Letting *Y* - = *(y*1*,...,yp)* and *D* = diag*(*<sup>√</sup> 1 *<sup>σ</sup>*<sup>11</sup> *,...,* <sup>1</sup> <sup>√</sup>*σpp ),* consider *Z* = *DY* . We note that Cov*(Y )* = *Σ* ⇒ Cov*(Z)* = *DΣD* = *R* which is the correlation matrix associated with *Y* .

In psychological testing situations or referring to the model (11.3.1), when a test score *xj* is multiplied by a scalar quantity *cj ,* then the factor loadings *λj*1*,...,λj r*, the error term *ej* and the general effect *μj* are all multiplied by *cj* , that is, *cjxj* = *cjμj* +*cj (λj*1*f*1+···+ *λj rfr)* + *cj ej* . Let Cov*(X)* = *Σ* = *(σij )*, that is, Cov*(xi, xj )* = *σij* , *X*- = *(x*1*,...,xp)* and *D* = diag*(*<sup>√</sup> 1 *<sup>σ</sup>*<sup>11</sup> *,...,* <sup>1</sup> <sup>√</sup>*σpp )*. Consider the model

$$DX = DM + D\Lambda F + D\epsilon \Rightarrow \text{Cov}(DX) = D\Sigma D = D\Lambda \Phi \Lambda' D + D\Psi D. \quad (11.3.6)$$

If *X*<sup>∗</sup> = *DX, M*<sup>∗</sup> = *DM, Λ*<sup>∗</sup> = *DΛ,* and <sup>∗</sup> = *D*, then we obtain the following model and the resulting covariance matrix:

$$X^\* = M^\* + \Lambda^\* F + \epsilon^\* \Rightarrow \Sigma^\* = \text{Cov}(X^\*) = \Lambda^\* \text{Cov}(F) \Lambda^{\*\prime} + \Psi^\*$$

$$\Rightarrow D\Sigma D = D\Lambda \Phi \Lambda^{\prime} D + D\Psi D$$

$$\Rightarrow R = \Lambda^\* \Phi \Lambda^{\*\prime} + \Psi^\* \tag{11.3.7}$$

where *R* = *(rij )* is the correlation matrix in *X*. An interesting point to be noted is that the identification conditions *Φ* = *I* and *Λ*∗- *Ψ*∗−<sup>1</sup> *Λ*∗ being diagonal become the following: *Φ* = *I* and *Λ*∗- *Ψ*∗−<sup>1</sup> *Λ*<sup>∗</sup> = *Λ*- *DD*−1*<sup>Ψ</sup>* <sup>−</sup>1*D*−1*DΛ* <sup>=</sup> *<sup>Λ</sup>*- *Ψ* <sup>−</sup>1*Λ* which is diagonal, that is, *Λ*- *<sup>Ψ</sup>* <sup>−</sup>1*<sup>Λ</sup>* is invariant under scaling transformations on the model or under *<sup>X</sup>*<sup>∗</sup> <sup>=</sup> *DX* and *Ψ*<sup>∗</sup> = *DΨD*.

#### **11.4. Maximum Likelihood Estimators of the Parameters**

A simple random sample of size *n* from the model *X* = *M* + *ΛF* + specified in (11.3.1) is understood to be constituted of independently and identically distributed (iid) *Xj* 's, *j* = 1*, . . . , n,* where

$$X\_j = M + AF + E\_j, \; j = 1, \dots, n, \; X\_j = \begin{bmatrix} x\_{1j} \\ x\_{2j} \\ \vdots \\ x\_{pj} \end{bmatrix}, \; E\_j = \begin{bmatrix} e\_{1j} \\ e\_{2j} \\ \vdots \\ e\_{pj} \end{bmatrix}, \tag{11.4.1}$$

and the *Xj* 's are iid. Let *fj* and *Ej* be independently normally distributed. Let *Ej* ∼ *Np(O, Ψ )* and *Xj* ∼ *Np(μ, Σ), Σ* = *ΛΦΛ*- + *Ψ* where *Φ* = Cov*(F ) > O, Ψ* = Cov*() > O* and *Σ>O*, *Ψ* being a diagonal matrix with positive diagonal elements. Then, the likelihood function is the following:

$$L = \prod\_{j=1}^{n} \frac{1}{(2\pi)^{\frac{p}{2}} |\boldsymbol{\Sigma}|^{\frac{1}{2}}} \mathbf{e}^{-\frac{1}{2}(\boldsymbol{X}\_{j} - \boldsymbol{M})'\boldsymbol{\Sigma}^{-1}(\boldsymbol{X}\_{j} - \boldsymbol{M})}$$

$$= \frac{1}{(2\pi)^{\frac{np}{2}} |\boldsymbol{\Sigma}|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2}\sum\_{j=1}^{n}(\boldsymbol{X}\_{j} - \boldsymbol{M})'\boldsymbol{\Sigma}^{-1}(\boldsymbol{X}\_{j} - \boldsymbol{M})}.\tag{11.4.2}$$

#### Factor Analysis

The sample matrix will be denoted by a boldface **X** = *(X*1*,...,Xn)*. Let *J* be the *n* × 1 vector of unities, that is, *J* = *(*1*,* 1*,...,* 1*)*- . Then,

$$\begin{aligned} \mathbf{X} &= (X\_1, \dots, X\_n) = \begin{bmatrix} x\_{11} & x\_{12} & \dots & x\_{1n} \\ x\_{21} & x\_{22} & \dots & x\_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x\_{p1} & x\_{p2} & \dots & x\_{pn} \end{bmatrix} \\\\ \Rightarrow \frac{1}{n} \mathbf{X} J &= \begin{bmatrix} \frac{1}{n} (\sum\_{j=1}^{n} x\_{1j}) \\ \vdots \\ \frac{1}{n} (\sum\_{j=1}^{n} x\_{pj}) \end{bmatrix} = \begin{bmatrix} \bar{x}\_1 \\ \bar{x}\_2 \\ \vdots \\ \bar{x}\_p \end{bmatrix} = \bar{X} \end{aligned}$$

where *X*¯ is the sample average vector or the sample mean vector. Let the boldface **X**¯ be the *p* × *n* matrix **X**¯ = *(X,* ¯ *X,... ,* ¯ *X)*¯ . Then,

$$(\mathbf{X} - \bar{\mathbf{X}})(\mathbf{X} - \bar{\mathbf{X}})' = S = (s\_{ij}), \ s\_{ij} = \sum\_{k=1}^{n} (\mathbf{x}\_{ik} - \bar{\mathbf{x}}\_{i})(\mathbf{x}\_{jk} - \bar{\mathbf{x}}\_{j}), \tag{11.4.3}$$

where *S* is the sample sum of products matrix or the "corrected" sample sum of squares and cross products matrix. Note that

$$\begin{aligned} \bar{X} &= \frac{1}{n} \mathbf{X}J \\ \Rightarrow \bar{\mathbf{X}} &= (\bar{X}, \dots, \bar{X}) = \mathbf{X} \left(\frac{1}{n} J J'\right), \\ \Rightarrow \mathbf{X} - \bar{\mathbf{X}} &= \mathbf{X} \left(I - \frac{1}{n} J J'\right). \end{aligned}$$

Thus,

$$S = \mathbf{X} \left( I - \frac{1}{n} J J' \right) \left( I - \frac{1}{n} J J' \right)' \mathbf{X}' = \mathbf{X} \left( I - \frac{1}{n} J J' \right) \mathbf{X}'. \tag{11.4.4}$$

Since *(Xj* − *M)*- *<sup>Σ</sup>*−<sup>1</sup>*(Xj* <sup>−</sup> *M)* is a real scalar quantity, we have the following:

$$\begin{split} \sum\_{j=1}^{n} (X\_j - M)' \Sigma^{-1} (X\_j - M) &= \sum\_{j=1}^{n} \text{tr} (X\_j - M)' \Sigma^{-1} (X\_j - M) \\ &= \sum\_{j=1}^{n} \text{tr} [\Sigma^{-1} (X\_j - M) (X\_j - M)'] \\ &= \text{tr} [\Sigma^{-1} \sum\_{j=1}^{n} (X\_j - \bar{X} + \bar{X} - M) (X\_j - \bar{X} + \bar{X} - M)'] \\ &= \text{tr} [\Sigma^{-1} \sum\_{j=1}^{n} (X\_j - \bar{X}) (X\_j - \bar{X})'] \\ &+ n \text{tr} [\Sigma^{-1} (\bar{X} - M) (\bar{X} - M)'] \\ &= \text{tr} (\Sigma^{-1} \bar{X}) + n (\bar{X} - M)' \Sigma^{-1} (\bar{X} - M), \qquad (11.4.5) \end{split}$$

Hence,

$$L = (2\pi)^{-\frac{np}{2}} |\boldsymbol{\Sigma}|^{-\frac{n}{2}} \mathbf{e}^{-\frac{1}{2} \{ \text{tr}(\boldsymbol{\Sigma}^{-1} \mathbf{S}) + n(\bar{X} - M)' \boldsymbol{\Sigma}^{-1} (\bar{X} - M) \}}. \tag{11.4.6}$$

Differentiating (11.4.6) with respect to *M*, equating the result to a null vector and solving, we obtain an estimator for *M*, denoted by *M*ˆ , as *M*ˆ = *X*¯ . Then, ln*L* evaluated at *M* = *X*¯ is

$$\begin{split} \ln L &= -\frac{np}{2}\ln(2\pi) - \frac{n}{2}\ln|\Sigma| - \frac{1}{2}\text{tr}(\Sigma^{-1}S) \\ &= -\frac{np}{2}\ln(2\pi) - \frac{n}{2}\ln|\Lambda\Phi\Lambda' + \Psi| - \frac{1}{2}\text{tr}[(\Lambda\Phi\Lambda' + \Psi)^{-1}S]. \end{split} \tag{11.4.7}$$

#### **11.4.1. Maximum likelihood estimators under identification conditions**

The derivations in the following sections parallel to those found in Mathai (2021). One of the conditions for identification of the model is *Φ* = *I* and *Λ*- *Ψ* <sup>−</sup>1*Λ* being a diagonal matrix with positive diagonal elements. We will examine the maximum likelihood estimators (MLE)/maximum likelihood estimates (MLE) under this identification condition. In this case, it follows from (11.4.7) that

$$\ln L = -\frac{np}{2}\ln(2\pi) - \frac{n}{2}\ln|AA' + \Psi| - \frac{1}{2}\text{tr}[(AA' + \Psi)^{-1}S].\tag{11.4.8}$$

By expanding the following determinant in two different ways by applying properties of partitioned determinants that are stated in Sect. 1.3, we have the following identities:

$$
\begin{vmatrix}
\Psi & -\Lambda \\
A' & I\_r
\end{vmatrix} = |\Psi| \left| I + \Lambda' \Psi^{-1} A \right| = |\Psi + \Lambda A'|.\tag{11.4.9}
$$

Hence, letting

$$
\Delta = \Lambda^{\prime} \Psi^{-1} \Lambda = \text{diag}(\delta\_1^{\prime} \delta\_1, \delta\_2^{\prime} \delta\_2, \dots, \delta\_r^{\prime} \delta\_r),
$$

we have

$$\begin{split} \ln|AA' + \Psi| &= \ln|\Psi| + \ln|I + A'\Psi^{-1}A| \\ &= \sum\_{j=1}^{p} \ln \psi\_{jj} + \sum\_{j=1}^{r} \ln(1 + \delta\_j' \delta\_j), \end{split} \tag{11.4.10}$$

where *δ*- *<sup>j</sup>* = *Λ*- *<sup>j</sup><sup>Ψ</sup>* <sup>−</sup><sup>1</sup> <sup>2</sup> *, Λj* is the *j* -th column of *Λ* and *ψjj , j* = 1*,... , p,* is the *j* th diagonal element of the diagonal matrix *Ψ*, the identification condition being that *Φ* = *I* and *Λ*- *<sup>Ψ</sup>* <sup>−</sup>1*<sup>Λ</sup>* <sup>=</sup> *<sup>Δ</sup>* <sup>=</sup> diag*(δ*- 1*δ*1*,...,δ*- *rδr)*. Accordingly, if we can write tr*(Σ*−1*S)* <sup>=</sup> tr[*(ΛΛ*- <sup>+</sup> *Ψ )*−1*S*] in terms of *ψjj , j* <sup>=</sup> <sup>1</sup>*,... , p,* and *<sup>δ</sup>*- *<sup>j</sup> δj , j* = 1*, . . . , r,* then the likelihood equation can be directly evaluated from (11.4.8) and (11.4.10), and the estimators can be determined. The following result will be helpful in this connection.

**Theorem 11.4.1.** *Whenever ΛΛ*- + *Ψ is nonsingular, which in this case, means real positive definite, the inverse is given by*

$$\left(\Lambda\Lambda^{\prime} + \Psi\right)^{-1} = \Psi^{-1} - \Psi^{-1}\Lambda(\Delta + I)^{-1}\Lambda^{\prime}\Psi^{-1} \tag{11.4.11}$$

*where the Δ is defined in (11.4.10).*

It can be readily verified that pre and post multiplications of *<sup>Ψ</sup>* <sup>−</sup><sup>1</sup> <sup>−</sup> *<sup>Ψ</sup>* <sup>−</sup>1*Λ(Δ* <sup>+</sup> *I )*−1*Λ*- *Ψ* <sup>−</sup><sup>1</sup> by *ΛΛ*-+ *Ψ* yield the identity matrix *Ip*.

# **11.4.2. Simplifications of** <sup>|</sup>*Σ*<sup>|</sup> **and** tr*(Σ*−1*S)*

In light of (11.4.9) and (11.4.10), we have

$$\begin{aligned} |\Sigma| &= |AA' + \Psi| = |\Psi| \, |A'\Psi^{-1}A + I| \\ &= |\Psi| \, |I + \Delta| = \left\{ \prod\_{j=1}^p \psi\_{jj} \right\} \Big| \prod\_{j=1}^r (1 + \delta\_j' \delta\_j) \Big|. \end{aligned}$$

Now, observe the following: In *Λ(Δ*+*I )*−<sup>1</sup> <sup>=</sup> *<sup>Λ</sup>* diag*(* <sup>1</sup> 1+*δ*- 1*δ*1 *,...,* <sup>1</sup> 1+*δ*- *r δr ),* the *j* -th column of *Λ* is multiplied by <sup>1</sup> 1+*δ*- *j δj* , *j* = 1*, . . . , r,* and

$$A(\Delta + I)^{-1}A' = \sum\_{j=1}^{r} \frac{1}{1 + \delta\_j' \delta\_j} A\_j A\_j'$$

where *Λj* is the *j* -th column of *Λ* and the *δj* 's are specified in (11.4.10). Thus,

$$\ln|\Sigma| = \sum\_{j=1}^{p} \ln \psi\_{jj} + \sum\_{j=1}^{r} \ln(1 + \delta\_j' \delta\_j) \tag{11.4.12}$$

and, on applying Theorem 11.4.1,

$$\begin{split} \text{tr}(\boldsymbol{\Sigma}^{-1}\boldsymbol{S}) &= \text{tr}[(\boldsymbol{\Lambda}\boldsymbol{\Lambda}^{\prime} + \boldsymbol{\Psi})^{-1}\boldsymbol{S}] = \text{tr}[(\boldsymbol{\Psi}^{-1}\boldsymbol{S})] - \text{tr}[\boldsymbol{\Psi}^{-1}\boldsymbol{\Lambda}(\boldsymbol{\Delta} + \boldsymbol{I})^{-1}\boldsymbol{\Lambda}^{\prime}\boldsymbol{\Psi}^{-1}\boldsymbol{S}] \\ &= \text{tr}(\boldsymbol{\Psi}^{-1}\boldsymbol{S}) - \sum\_{j=1}^{r} \frac{1}{1 + \boldsymbol{\delta}\_{j}^{\prime}\boldsymbol{\delta}\_{j}} \text{tr}(\boldsymbol{\Lambda}\_{j}\boldsymbol{\Lambda}\_{j}^{\prime}(\boldsymbol{\Psi}^{-1}\boldsymbol{S}\boldsymbol{\Psi}^{-1})) \\ &= \text{tr}(\boldsymbol{\Psi}^{-1}\boldsymbol{S}) - \sum\_{j=1}^{r} \frac{1}{1 + \boldsymbol{\delta}\_{j}^{\prime}\boldsymbol{\delta}\_{j}} \text{tr}(\boldsymbol{\Lambda}\_{j}^{\prime}(\boldsymbol{\Psi}^{-1}\boldsymbol{S}\boldsymbol{\Psi}^{-1})\boldsymbol{\Lambda}\_{j}) \\ &= \text{tr}(\boldsymbol{\Psi}^{-1}\boldsymbol{S}) - \sum\_{j=1}^{r} \frac{1}{1 + \boldsymbol{\delta}\_{j}^{\prime}\boldsymbol{\delta}\_{j}} \boldsymbol{\Lambda}\_{j}^{\prime}(\boldsymbol{\Psi}^{-1}\boldsymbol{S}\boldsymbol{\Psi}^{-1})\boldsymbol{A}\_{j} \end{split} \tag{11.4.13}$$

where *Λj* is the *j* -th column of *Λ*, which follows by making use of the property tr*(AB)* = tr*(BA)* and observing that *Λ*- *<sup>j</sup> (Ψ* <sup>−</sup>1*SΨ* <sup>−</sup><sup>1</sup>*)Λj* is a quadratic form.

# **11.4.3. Special case** *<sup>Ψ</sup>* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup>*Ip*

Letting *<sup>Ψ</sup>* <sup>=</sup> *<sup>σ</sup>*2*<sup>I</sup>* where *<sup>σ</sup>*<sup>2</sup> is a real scalar, *<sup>Ψ</sup>* <sup>−</sup><sup>1</sup> <sup>=</sup> *<sup>σ</sup>* <sup>−</sup><sup>2</sup>*Ip* <sup>=</sup> *θIp* where *<sup>θ</sup>* <sup>=</sup> *<sup>σ</sup>* <sup>−</sup>2, and the log-likelihood function can be simplified as

$$\ln L = -\frac{np}{2}\ln(2\pi) + \frac{np}{2}\ln\theta - \frac{n}{2}\sum\_{j=1}^{r} \ln(1 + \delta\_j'\delta\_j)$$

$$-\frac{\theta}{2}\text{tr}(\mathcal{S}) + \frac{\theta^2}{2}\sum\_{j=1}^{r} \frac{1}{1 + \delta\_j'\delta\_j}\Lambda\_j'SA\_j$$

where 1 + *δ*- *<sup>j</sup> δj* = 1 + *θΛ*- *<sup>j</sup>Λj* with *Λj* being the *j* -th column of *Λ*. Consider the equation

$$\begin{split} \frac{\partial}{\partial \theta} \ln L &= 0 \Rightarrow \\ \frac{np}{\theta} - n \sum\_{j=1}^{r} \frac{\Lambda\_{j}' \Lambda\_{j}}{1 + \theta \Lambda\_{j}' \Lambda\_{j}} - \text{tr}(\mathcal{S}) \\ &+ 2\theta \sum\_{j=1}^{r} \frac{\Lambda\_{j}' \mathcal{S} \Lambda\_{j}}{1 + \theta \Lambda\_{j}' \Lambda\_{j}} - \theta^{2} \sum\_{j=1}^{r} \frac{\Lambda\_{j}' \Lambda\_{j}}{(1 + \theta \Lambda\_{j}' \Lambda\_{j})^{2}} \Lambda\_{j}' \mathcal{S} \Lambda\_{j} = 0. \end{split} \tag{11.4.14}$$

For a specific *j* , we have

$$\frac{\partial}{\partial \Lambda\_j} \ln L = O \Rightarrow$$

$$-\frac{n}{2} \frac{2\theta \,\Lambda\_j}{1 + \theta \,\Lambda\_j' \Lambda\_j} + \frac{\theta^2}{2} \frac{2SA\_j}{1 + \theta \,\Lambda\_j' \Lambda\_j} - \frac{\theta^2}{2} \frac{\Lambda\_j' \text{SA}\_j}{[1 + \theta \,\Lambda\_j' \Lambda\_j]^2} (2\theta) \Lambda\_j = O. \tag{11.4.15}$$

Pre-multiplying (11.4.15) by *Λ*- *<sup>j</sup>* yields

$$-\frac{n\theta\,\Lambda\_j'\,\Lambda\_j}{1+\theta\,\Lambda\_j'\,\Lambda\_j} + \frac{\theta^2\,\Lambda\_j'\,\Omega\,\Lambda\_j}{1+\theta\,\Lambda\_j'\,\Lambda\_j} - \frac{\theta^3(\Lambda\_j'\,\Omega\,\Lambda\_j)(\Lambda\_j'\,\Lambda\_j)}{(1+\theta\,\Lambda\_j'\,\Lambda\_j)^2} = 0.\tag{11.4.16}$$

Now, on comparing (11.4.16) with (11.4.14) after multiplying (11.4.14) by *θ*, we have

$$np - \theta \operatorname{tr}(S) + \theta^2 \sum\_{j=1}^r \frac{A\_j' S A\_j}{1 + \theta A\_j' A\_j} = 0. \tag{11.4.17}$$

Multiplying the left-hand side of (11.4.15) by [1 + *θΛ*- *<sup>j</sup>Λj* ] <sup>2</sup>*/θ* gives

$$-n(1+\theta \,\Lambda\_j'\Lambda\_j)\Lambda\_j + \theta(1+\theta \,\Lambda\_j'\Lambda\_j)S\Lambda\_j - \theta^2(\Lambda\_j'S\Lambda\_j)\Lambda\_j = O.\tag{11.4.18}$$

Then, by pre-multiplying (11.4.18) by *Λ*- *<sup>j</sup>* , we obtain

$$\begin{split}-n(1+\theta\Lambda\_j'\Lambda\_j)\Lambda\_j'\Lambda\_j + \theta[(1+\theta\Lambda\_j'\Lambda\_j)\Lambda\_j'S\Lambda\_j - \theta^2(\Lambda\_j'S\Lambda\_j)\Lambda\_j'\Lambda\_j = 0 \Rightarrow\\ \theta(\Lambda\_j'S\Lambda\_j) = n(1+\theta\Lambda\_j'\Lambda\_j)(\Lambda\_j'\Lambda\_j),\end{split} \tag{11.4.19}$$

which provides the following representation of *θ*:

$$\theta = \frac{n\Lambda\_j'\Lambda\_j}{\Lambda\_j'SA\_j - n(\Lambda\_j'A\_j)^2} \text{ for } \Lambda\_j'SA\_j - n(\Lambda\_j'A\_j)^2 > 0\tag{i}$$

since *θ* must be positive. Further, on substituting *θΛ*- *<sup>j</sup>SΛj* (11.4.19) <sup>=</sup> *n(*<sup>1</sup> <sup>+</sup> *θΛ*- *<sup>j</sup>Λj )(Λ*- *<sup>j</sup>Λj )* in (11.4.18), we have

$$-n\Lambda\_j + \theta [S - \mathfrak{n}(\Lambda\_j'\Lambda\_j)I]\Lambda\_j = O,\tag{ii}$$

which, on replacing *θ* by the right-hand side of *(i)* yields

$$-(A\_j'SA\_j)A\_j + (A\_j'A\_j)SA\_j = O$$

or, equivalently,

$$\left[\mathcal{S} - \frac{\Lambda\_j' S \Lambda\_j}{\Lambda\_j' \Lambda\_j} I\right] \Lambda\_j = \mathcal{O} \Rightarrow\tag{11.4.20}$$

$$[\mathbf{S} - \lambda\_j I] \Lambda\_j = O \tag{11.4.21}$$

where *λj* <sup>=</sup> *<sup>Λ</sup>*- *<sup>j</sup> SΛj Λ*- *<sup>j</sup>Λj* . Observe that *Λj* in (11.4.20) is an eigenvector of *S* for *j* = 1*,...,p*. Substituting the value of *Λ*- *<sup>j</sup>SΛj* from (11.4.19) into (11.4.17) gives the following estimate of *θ*:

$$\hat{\theta} = \frac{np}{\text{tr}(S) - n\sum\_{j=1}^{p} \hat{A}\_j'\hat{A}\_j} \tag{11.4.22}$$

whenever the denominator is positive as *θ* is by definition positive. Now, in light of (11.4.18) and (11.4.19), we can also obtain the following result for each *j* :

$$\hat{\theta} = \frac{n\hat{A}\_j'\hat{A}\_j}{\hat{A}\_j'S\hat{A}\_j - n(\hat{A}\_j'\hat{A}\_j)^2}, \ j = 1, \ldots, p,\tag{11.4.23}$$

requiring again that the denominator be positive. In this case, *Λ*ˆ *<sup>j</sup>* is an eigenvector of *S* for *j* = 1*,...,p*. Let us conveniently normalize the *Λ*ˆ *<sup>j</sup>* 's so that the denominator in (11.4.22) and (11.4.23) remain positive.

Thus, *Λ*ˆ *<sup>j</sup>* is an eigenvector of *S* with the corresponding eigenvalue *λj* for each *j* , *j* = 1*,...,p*. Out of these, the first *r* of them, corresponding to the *r* largest eigenvalues, will also be estimates for the factor loadings *Λj , j* = 1*,...,r*. Observe that we can multiply *Λ*ˆ *<sup>j</sup>* by any constant *c*<sup>1</sup> without affecting equations (11.4.20) or (11.4.21). This constant *c*<sup>1</sup> may become necessary to keep the denominators in (11.4.22) and (11.4.23) positive. Hence we have the following result:

Factor Analysis

**Theorem 11.4.2.** *The sum of all the eigenvalues of S from Eq. (11.4.20), including the estimates of the r factor loadings Λ*ˆ1*,..., Λ*ˆ*p, is given by*

$$\sum\_{j=1}^{p} \frac{\hat{A}\_j' S \hat{A}\_j}{\hat{A}\_j' \hat{A}\_j} = \text{tr}(S). \tag{11.4.24}$$

It can be established that the representations of *θ*ˆ given by (11.4.22) and (11.4.23) are one and the same. The equation giving rise to (11.4.23) is

$$n\theta[\Lambda\_j' \mathcal{S} \Lambda\_j - n(\Lambda\_j' \Lambda\_j)^2] = n\Lambda\_j' \Lambda\_j \text{ for each } j. \tag{iii}$$

Let us divide both sides of *(iii)* by *Λ*- *<sup>j</sup>Λj* . Observe that *Λ*- *<sup>j</sup> SΛj Λ*- *<sup>j</sup>Λj* <sup>=</sup> *λj* is an eigenvalue of *S* for *j* = 1*,... , p,* treating *Λj* as an eigenvector of *S*. Now, taking the sum over *j* = 1*,... , p,* on both sides of *(iii)* after dividing by *Λ*- *<sup>j</sup>Λj* , we have

$$\begin{aligned} \theta \left[ \sum\_{j=1}^p \lambda\_j - n \sum\_{j=1}^p (\Lambda\_j' A\_j) \right] &= np \Rightarrow \\ \theta \left[ \text{tr}(S) - n \sum\_{j=1}^p \Lambda\_j' A\_j \right] &= np, \end{aligned} \tag{i\nu}$$

which is Eq. (11.4.22). This proves the claim.

Hence the procedure is the following: Compute the eigenvalues and the corresponding eigenvectors of the sample sum of products matrix *S*. The estimates for the factor loadings, denoted by *Λ*ˆ *<sup>j</sup>* , are available from the eigenvectors *Λ*ˆ *<sup>j</sup>* of *S* after appropriate normalization to make the denominators in (11.4.22) and (1.4.23) positive. Take the first *r* largest eigenvalues of *S* and then compute the corresponding eigenvectors to obtain estimates for all the factor loadings. This methodology is clearly related to that utilized in Principal Component Analysis, the estimates of the variances of the principal components being *Λ*ˆ- *<sup>j</sup>SΛ*ˆ *<sup>j</sup> /Λ*ˆ- *<sup>j</sup>Λ*ˆ *<sup>j</sup>* for *j* = 1*,...,r*.

#### **Verification**

Does the representation of *θ* given in (11.4.22) and (11.4.23) satisfy the likelihood Eq. (11.4.14)? Since *θ* is estimated through *Λj* for each *j* = 1*,... , p,* we may replace *θ* in (11.4.14) by *θj* and insert the summation symbol. Equation (11.4.14) will then be

$$\begin{split} \ln \sum\_{j} \frac{1}{\theta\_{j}} - n \sum\_{j} \frac{A\_{j}^{\prime} A\_{j}}{1 + \theta\_{j} A\_{j}^{\prime} \Lambda\_{j}} - \text{tr}(S) \\ + 2 \sum\_{j} \theta\_{j} \frac{A\_{j}^{\prime} S A\_{j}}{1 + \theta\_{j} A\_{j}^{\prime} \Lambda\_{j}} - \sum\_{j} \theta\_{j}^{2} \frac{A\_{j}^{\prime} A\_{j} (\Lambda\_{j}^{\prime} S A\_{j})}{(1 + \theta\_{j} \Lambda\_{j}^{\prime} \Lambda\_{j})^{2}} = 0. \end{split} \tag{11.4.25}$$

Now, substituting the value of *θj* specified in (11.4.23) into (11.4.14), the left-hand side of (11.4.14) reduces to the following:

$$\begin{split} & n \sum\_{j} \frac{[\Lambda\_{j}' S \Lambda\_{j} - n(\Lambda\_{j}' \Lambda\_{j})^{2}]}{n \Lambda\_{j}' \Lambda\_{j}} - n \sum\_{j} \frac{\Lambda\_{j}' \Lambda\_{j}}{\Lambda\_{j}' S \Lambda\_{j}} [\Lambda\_{j}' S \Lambda\_{j} - n(\Lambda\_{j}' \Lambda\_{j})^{2}] - \text{tr}(S) \\ & + 2 \sum\_{j} n \Lambda\_{j}' \Lambda\_{j} - \sum\_{j} \frac{\Lambda\_{j}' \Lambda\_{j}}{\Lambda\_{j}' S \Lambda\_{j}} (n \Lambda\_{j}' \Lambda\_{j})^{2} \\ & = \sum\_{j} \frac{\Lambda\_{j}' S \Lambda\_{j}}{\Lambda\_{j}' \Lambda\_{j}} - \text{tr}(S) = 0, \end{split}$$

owing to Theorem 11.4.2. Hence, Eq. (11.4.14) holds for the value of *θ* given in (11.4.23) and the value of *Λj* specified in (11.4.20).

Since the basic estimating equation for *θ*ˆ arises from (11.4.23) as

$$
\theta \left[ \Lambda\_j' S \Lambda\_j - n \left( \Lambda\_j' \Lambda\_j \right)^2 \right] = n \Lambda\_j' \Lambda\_j \, , \tag{\nu}
$$

a combined estimate for *θ* can be secured. On dividing both sides of *(v)* by *Λ*- *<sup>j</sup>Λj* and summing up over *j* , *j* = 1*,...,p*, it is seen that the resulting estimate of *θ* agrees with that given in (11.4.22).

#### **11.4.4. Maximum value of the exponent**

We have the estimate *θ*ˆ of *θ* provided in (11.4.23) at the estimated value *Λ*ˆ *<sup>j</sup>* of *Λj* for each *j* , where *Λ*ˆ *<sup>j</sup>* is an eigenvector of *S* resulting from (11.4.20). The exponent of the likelihood function is <sup>−</sup><sup>1</sup> <sup>2</sup> tr*(Σ*−1*S)* and, in the current context, *<sup>Σ</sup>* <sup>=</sup> *ΛΦΛ*- + *Ψ*, the identification conditions being that *Φ* = *Ip* and *Λ*- *Ψ* <sup>−</sup>1*Λ* be a diagonal matrix with positive diagonal elements. Under these conditions and for the special case *<sup>Ψ</sup>* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup>*Ip* with *<sup>σ</sup>* <sup>−</sup><sup>2</sup> <sup>=</sup> *θ ,* we have shown that the exponent in the log-likelihood function reduces to −1 <sup>2</sup> *<sup>θ</sup>* tr*(S)* <sup>+</sup> <sup>1</sup> <sup>2</sup> *<sup>θ</sup>* <sup>2</sup> *<sup>r</sup> j*=1 *Λ*- *<sup>j</sup> SΛj* 1+*θΛ*- *<sup>j</sup>Λj* . Now, consider *<sup>θ</sup>* tr*(S)* <sup>−</sup> *<sup>r</sup> <sup>j</sup>*=<sup>1</sup> *<sup>θ</sup>* <sup>2</sup> *<sup>Λ</sup>*- *<sup>j</sup> SΛj* 1+*θΛ*- *<sup>j</sup>Λj* <sup>≡</sup> *<sup>δ</sup>*. Then,

Factor Analysis

$$\begin{split} \delta &= \theta \operatorname{tr}(\mathcal{S}) - \theta \sum\_{j=1}^{r} \theta \frac{\Lambda\_{j}^{\prime} S \Lambda\_{j}}{1 + \theta \Lambda\_{j}^{\prime} \Lambda\_{j}} \\ &= \theta \left[ \operatorname{tr}(\mathcal{S}) - \sum\_{j=1}^{p} \theta \frac{\Lambda\_{j}^{\prime} S \Lambda\_{j}}{1 + \theta \Lambda\_{j}^{\prime} \Lambda\_{j}} \right] \\ &= \theta \left[ \operatorname{tr}(\mathcal{S}) - \sum\_{j=1}^{p} n \Lambda\_{j}^{\prime} \Lambda\_{j} \right] \operatorname{from} (11.4.19) \\ &= np \text{ from } (11.4.22). \end{split}$$

Hence the result.

**Example 11.4.1.** Tests are conducted to evaluate *x*<sup>1</sup> : verbal-linguistic skills, *x*<sup>2</sup> : spatial visualization ability, and *x*<sup>3</sup> : mathematical abilities. Test scores on *x*1*, x*2*,* and *x*<sup>3</sup> are available. It is known that these abilities are governed by two intellectual faculties that will be identified as *f*<sup>1</sup> and *f*2, and that linear functions of *f*<sup>1</sup> and *f*<sup>2</sup> are contributing to *x*1*, x*2*, x*3. These coefficients in the linear functions, known as factor loadings, are unknown. Let *Λ* = *(λij )* be the matrix of factor loadings. Then, we have the model

$$\begin{aligned} x\_1 &= \lambda\_{11} f\_1 + \lambda\_{12} f\_2 + \mu\_1 + e\_1 \\ x\_2 &= \lambda\_{21} f\_1 + \lambda\_{22} f\_2 + \mu\_2 + e\_2 \\ x\_3 &= \lambda\_{31} f\_1 + \lambda\_{32} f\_2 + \mu\_3 + e\_3 \end{aligned}$$

Let

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \ M = \begin{bmatrix} \mu\_1 \\ \mu\_2 \\ \mu\_3 \end{bmatrix}, \ \epsilon = \begin{bmatrix} e\_1 \\ e\_2 \\ e\_3 \end{bmatrix} \text{ and } \ F = \begin{bmatrix} f\_1 \\ f\_2 \end{bmatrix}.$$

where *M* is some general effect, is the error vector or the sum total of the contributions from unknown factors, *F* represents the vector of contributing factors and *Λ,* the levels of the contributions. Let Cov*(X)* = *Σ,* Cov*(F )* = *Φ* and Cov*()* = *Ψ*. Under the assumptions *<sup>Φ</sup>* <sup>=</sup> *<sup>I</sup>* , *<sup>Ψ</sup>* <sup>=</sup> *<sup>σ</sup>*2*<sup>I</sup>* where *<sup>σ</sup>*<sup>2</sup> is a real scalar quantity, and *<sup>I</sup>* is the identity matrix, and *Λ*- *Ψ* <sup>−</sup>1*Λ* is diagonal, estimate the factor loadings *λij* 's and *σ*<sup>2</sup> in *Ψ*. A battery of tests are conducted on a random sample of six individuals and the following are the data, where our notations are **X**: the matrix of sample values, *X*¯ : the sample average, **X**¯ : the matrix of sample averages, and *S*: the sample sum of products matrix. So, letting

$$\mathbf{X} = [X\_1, X\_2, \dots, X\_6] = \begin{bmatrix} 4 & 2 & 4 & 2 & 4 & 2 \\ 4 & 2 & 1 & 2 & 1 & 2 \\ 2 & 5 & 3 & 3 & 3 & 2 \end{bmatrix},$$

estimate the factor loadings *λij* 's and the variance *σ*2.

**Solution 11.4.1.** We begin with the computations of the various quantities required to arrive at a solution. Observe that since the matrix of factor loadings *Λ* is 3 × 2 and a random sample of 6 observation vectors is available, *n* = 6*, p* = 3 and *r* = 2 in our notation.

In this case,

$$\begin{aligned} \bar{X} &= \begin{bmatrix} 3 \\ 2 \\ 3 \end{bmatrix}, \quad \bar{\mathbf{X}} = [\bar{X}, \bar{X}, \dots, \bar{X}], \\\bar{\mathbf{X}} - \bar{\mathbf{X}} &= \begin{bmatrix} 1 & -1 & 1 & -1 & 1 & -1 \\ 2 & 0 & -1 & 0 & -1 & 0 \\ -1 & 2 & 0 & 0 & 0 & -1 \end{bmatrix}, \\\ [\mathbf{X} - \bar{\mathbf{X}}] \mathbf{[}\mathbf{X} - \bar{\mathbf{X}}\mathbf{]}\prime &= S = \begin{bmatrix} 6 & 0 & -2 \\ 0 & 6 & -2 \\ -2 & -2 & 6 \end{bmatrix}. \end{aligned}$$

An estimator/estimate of *<sup>Σ</sup>* is *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>S</sup> <sup>n</sup>* whereas an unbiased estimator/estimate of *Σ* is *S <sup>n</sup>*−<sup>1</sup> . An eigenvalue of *<sup>S</sup> <sup>α</sup>* is <sup>1</sup> *<sup>α</sup>* times the corresponding eigenvalue of *S*. Moreover, constant multiples of eigenvectors are also eigenvectors for a given eigenvalue. Accordingly, we will work with *S* instead of *Σ*ˆ or an unbiased estimate of *Σ*. Since

$$
\begin{vmatrix}
6 - \lambda & 0 & -2 \\
0 & 6 - \lambda & -2 \\
\end{vmatrix} = 0 \\
\Rightarrow (6 - \lambda)(\lambda^2 - 12\lambda + 28) = 0,
$$

the eigenvalues are *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>6</sup> <sup>+</sup> <sup>√</sup>8*, λ*<sup>2</sup> <sup>=</sup> <sup>6</sup>*, λ*<sup>3</sup> <sup>=</sup> <sup>6</sup> <sup>−</sup> <sup>√</sup>8, the two largest ones being *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>6</sup> <sup>+</sup> <sup>√</sup>8 and *<sup>λ</sup>*<sup>2</sup> <sup>=</sup> 6. Let us evaluate the eigenvectors *<sup>U</sup>*1*, U*<sup>2</sup> and *<sup>U</sup>*<sup>3</sup> associated with these three eigenvalues. An eigenvector *<sup>U</sup>*<sup>1</sup> corresponding to *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> <sup>6</sup> <sup>+</sup> <sup>√</sup>8 will be a solution of the equation

$$
\begin{bmatrix} -\sqrt{8} & 0 & -2 \\ 0 & -\sqrt{8} & -2 \\ -2 & -2 & -\sqrt{8} \end{bmatrix} \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} \Rightarrow U\_1 = \begin{bmatrix} -\frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 1 \end{bmatrix}.
$$

For *λ*<sup>2</sup> = 6, the equation to be solved is

$$
\begin{bmatrix} 0 & 0 & -2 \\ 0 & 0 & -2 \\ -2 & -2 & 0 \end{bmatrix} \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} \Rightarrow U\_2 = \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} \dots
$$

Factor Analysis

As for the eigenvalue *<sup>λ</sup>*<sup>3</sup> <sup>=</sup> <sup>6</sup> <sup>−</sup> <sup>√</sup>8, it is seen from the derivation of *<sup>U</sup>*<sup>1</sup> that *<sup>U</sup>*- 3 = [√ 1 2 *,* √ 1 2 *,* 1]. Let us now examine the denominator of *θ*ˆ in (11.4.22). Observe that *U*- <sup>1</sup>*U*<sup>1</sup> = 2*, U*- <sup>2</sup>*U*<sup>2</sup> = 2*, U*- <sup>3</sup>*U*<sup>3</sup> = 2 and *n <sup>p</sup> <sup>j</sup>*=<sup>1</sup> *<sup>U</sup>*- *<sup>j</sup>Uj* = 6*(*2+2+2*)* = 36. However, tr*(S)* = *(*6+ <sup>√</sup>8*)*+*(*6*)*+*(*6−√8*)* <sup>=</sup> 18. So, let us multiply each vector by <sup>√</sup> 1 3 so that *n(<sup>p</sup> <sup>j</sup>*=<sup>1</sup> *<sup>U</sup>*- *<sup>j</sup>Uj* = 6*(* <sup>2</sup> <sup>3</sup> <sup>+</sup> <sup>2</sup> <sup>3</sup> <sup>+</sup> <sup>2</sup> <sup>3</sup> *)* = 12 and tr*(S)* − *n <sup>p</sup> <sup>j</sup>*=<sup>1</sup> *<sup>U</sup>*- *<sup>j</sup>Uj* = 18 − 12 *>* 0. Thus, the estimate of *θ* is given by

$$\hat{\theta} = \frac{np}{\text{tr}(\mathcal{S}) - n\sum\_{j=1}^{p} U\_j' U\_j} = \frac{(6)(\mathcal{S})}{18 - 12} = \mathcal{3} \Rightarrow \hat{\sigma}^2 = \frac{1}{3}$$

In light of (11.4.20), the factor loadings are estimated by *U*<sup>1</sup> and *U*<sup>2</sup> scaled by <sup>√</sup> 1 3 . Hence, the estimates of the factor loadings, denoted with a hat, are the following: *λ*ˆ <sup>11</sup> = *(*<sup>√</sup> 1 3 *)(*−√ 1 2 *)* = −√ 1 6 , *λ*ˆ <sup>21</sup> = *(*<sup>√</sup> 1 3 *)(*−√ 1 2 *)* = −√ 1 6 *, λ*ˆ <sup>31</sup> = *(*<sup>√</sup> 1 3 *)(*1*)* = <sup>√</sup> 1 3 , *λ*ˆ <sup>12</sup> = *(*√ 1 3 *)(*1*)* = <sup>√</sup> 1 3 *, λ*ˆ <sup>22</sup> = *(*<sup>√</sup> 1 3 *)(*−1*)* = −√ 1 3 *, λ*ˆ <sup>32</sup> = 0.

#### **11.5. General Case**

Let

$$\begin{aligned} \delta\_{j} &= \begin{bmatrix} \theta\_{1}\lambda\_{1j} \\ \theta\_{2}\lambda\_{2j} \\ \vdots \\ \theta\_{p}\lambda\_{pj} \end{bmatrix}, \quad \delta\_{j}^{'}\delta\_{j} = \Lambda\_{j}^{'}\Theta^{2}\Lambda\_{j}, \ \Lambda\_{j} = \begin{bmatrix} \lambda\_{1j} \\ \lambda\_{2j} \\ \vdots \\ \lambda\_{pj} \end{bmatrix}, \\\\ \mu^{-1} &= \Theta^{2} - \begin{bmatrix} \theta\_{1}^{2} & 0 & \dots & 0 \\ 0 & \theta\_{2}^{2} & \dots & 0 \end{bmatrix}\_{\text{and } |I| + \Lambda^{'}\Omega^{2}\Lambda^{\dagger} - \prod\_{j=1}^{r} \Omega^{\dagger} \end{aligned}$$

$$\Psi^{-1} = \Theta^2 = \begin{vmatrix} \dot{0} & \theta\_2^2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \theta\_p^2 \end{vmatrix} \text{ and } |I + \Lambda' \Theta^2 \Lambda| = \prod\_{j=1}^r (1 + \delta\_j' \delta\_j).$$

We will take *δj , j* = 1*, . . . , r,* and *Θ* = diag*(θ*1*, θ*2*,...,θp)* as the parameters. Expressed in terms of the *δj* 's and *Θ*, the log-likelihood function is the following:

$$\begin{split} \ln L &= -\frac{np}{2}\ln(2\pi) + n\sum\_{j=1}^{p} \ln \theta\_{j} - \frac{n}{2} \sum\_{j=1}^{r} \ln(1 + \delta\_{j}'\delta\_{j}) \\ &- \frac{1}{2}\text{tr}(\boldsymbol{\Theta}^{2}\boldsymbol{\mathcal{S}}) + \frac{1}{2} \sum\_{j=1}^{r} \frac{1}{1 + \delta\_{j}'\delta\_{j}} (\delta\_{j}'\boldsymbol{\Theta}\boldsymbol{\mathcal{S}}\boldsymbol{\Theta}\boldsymbol{\delta}\_{j}).\end{split} \tag{11.5.1}$$

Let us take *δj , j* = 1*, . . . , r,* and *Θ* as the parameters. Differentiating ln*L* partially with respect to the vector *δj* , for a specific *j* , and equating the result to a null vector, we have the following (referring to Chap. 1 for vector/matrix derivatives):

$$-\frac{n}{2}\frac{2\delta\_j}{1+\delta\_j'\delta\_{\hat{j}}} - \frac{1}{2}(\delta\_j'\Theta\,\mathbf{S}\Theta\,\delta\_{\hat{j}})\frac{2\delta\_j}{(1+\delta\_j'\delta\_{\hat{j}})^2} + \frac{1}{2}\frac{2\,\Theta\,\mathbf{S}\Theta\,\delta\_{\hat{j}}}{1+\delta\_j'\delta\_{\hat{j}}} = O,\tag{i}$$

which multiplied by 1 + *δ*- *<sup>j</sup> δj >* 0, yields

$$-n\delta\_j - \frac{(\delta\_j^{\prime}\Theta \, S\Theta \, \delta\_j)}{1 + \delta\_j^{\prime}\delta\_j}\delta\_j + (\Theta \, S\Theta)\delta\_j = O. \tag{11.5.2}$$

On premultiplying (11.5.2) by *δ*- *<sup>j</sup>* and then by 1 + *δ*- *<sup>j</sup> δj* , and simplifying, we then have

$$-n(1+\delta\_j'\delta\_j)(\delta\_j'\delta\_j) + (\delta\_j'\Theta\,\mathcal{S}\Theta\,\delta\_j) = O \Rightarrow$$

$$\frac{\delta\_j'\Theta\,\mathcal{S}\Theta\,\delta\_j}{1+\delta\_j'\delta\_j} = n\delta\_j'\delta\_j.\tag{11.5.3}$$

Let us differentiate ln*L* as given in (11.5.1) partially with respect to *θj* for a specific *j* such as *j* = 1. Then,

$$
\frac{n}{\theta\_j} - \frac{1}{2} 2\theta\_j s\_{jj} + \frac{1}{2} \sum\_{j=1}^r \frac{1}{1 + \delta\_j' \delta\_j} \frac{\partial}{\partial \theta\_j} (\delta\_j' \Theta \, S \Theta \, \delta\_j),
$$

where

$$\begin{aligned} \frac{\partial}{\partial \theta\_j} (\delta\_j' \Theta S \Theta \delta\_j) &= \delta\_j' \Big[ \frac{\partial}{\partial \theta\_j} \Theta \Big] S \Theta \delta\_j + \delta\_j' \Theta S \Big[ \frac{\partial}{\partial \theta\_j} \Theta \Big] \delta\_j, \\\frac{\partial}{\partial \theta\_1} \Theta &= \begin{bmatrix} 1 & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 0 \end{bmatrix} \Rightarrow \theta\_1 \frac{\partial}{\partial \theta\_1} \Theta = \begin{bmatrix} \theta\_1 & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 0 \end{bmatrix}, \\\ \end{aligned}$$

so that

$$\left[\theta\_1 \frac{\partial}{\partial \theta\_1} + \dots + \theta\_p \frac{\partial}{\partial \theta\_p}\right] \Theta = \Theta. \tag{ii}$$

Hence,

$$
\left[\sum\_{j=1}^p \theta\_j \frac{\partial}{\partial \theta\_j}\right] \sum\_{j=1}^r \frac{\delta\_j' \Theta \, S \Theta \, \delta\_j}{1 + \delta\_j' \delta\_j} = 2 \sum\_{j=1}^r \frac{\delta\_j' \Theta \, S \Theta \, \delta\_j}{1 + \delta\_j' \delta\_j},
$$

Factor Analysis

and then,

$$\begin{aligned} \left[\sum\_{j=1}^p \theta\_j \frac{\partial}{\partial \theta\_j}\right] L &= 0 \Rightarrow \\\ n p - \sum\_{j=1}^p \theta\_j^2 s\_{jj} + \sum\_{j=1}^r \frac{\delta\_j' \Theta \, S \Theta \, \delta\_j}{1 + \delta\_j' \delta\_j} &= 0. \end{aligned} \tag{11.5.4}$$

However, given (11.5.3), we have *nδ*- *<sup>j</sup> δj* = *δ*- *<sup>j</sup>ΘSΘδj /(*1 + *δ*- *<sup>j</sup> δj ), j* = 1*, . . . , r,* and therefore (11.5.4) can be expressed as

$$np - \sum\_{j=1}^{p} \theta\_j^2 s\_{jj} + \sum\_{j=1}^{r} n \delta\_j' \delta\_j = 0. \tag{iii}$$

Letting

$$c = \frac{1}{p} \sum\_{j=1}^{r} \delta\_j' \delta\_j,\tag{11.5.5}$$

equation*(iii)* can be written as

$$\sum\_{j=1}^{p} [n(1+c) - \theta\_j^2 s\_{jj}] = 0, \ c = 0, \ j = r+1, \dots, p,$$

so that a solution for *θj* is

$$
\hat{\theta}\_j^2 = \frac{n(1+c)}{s\_{jj}} \text{ or } \hat{\sigma}\_j^2 = \frac{s\_{jj}}{n(1+c)},\tag{11.5.6}
$$

with the proviso that *<sup>c</sup>* <sup>=</sup> 0 for *<sup>θ</sup>*ˆ<sup>2</sup> *<sup>j</sup>* and *<sup>σ</sup>*<sup>ˆ</sup> <sup>2</sup> *<sup>j</sup> , j* = *r* + 1*,...,p*. Then, an estimate of *ΘSΘ* is given by

$$
\hat{\Theta}S\hat{\Theta} = n(1+c) \begin{bmatrix}
\frac{1}{\sqrt{s\_{11}}} & 0 & \dots & 0 \\
0 & \frac{1}{\sqrt{s\_{22}}} & \dots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \dots & \frac{1}{\sqrt{s\_{pp}}}
\end{bmatrix} S \begin{bmatrix}
\frac{1}{\sqrt{s\_{11}}} & 0 & \dots & 0 \\
0 & \frac{1}{\sqrt{s\_{22}}} & \dots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \dots & \frac{1}{\sqrt{s\_{pp}}}
\end{bmatrix} \\
$$

$$
= n(1+c)R
\tag{11.5.7}
$$

where *R* is the sample correlation matrix, and on applying the identities (11.5.3) and (11.5.7), (11.5.2) becomes

$$\begin{aligned} -n\delta\_j - n(\delta\_j'\delta\_j)\delta\_j + (n(1+c)R)\delta\_j &= 0 \Rightarrow\\ \left[R - \frac{1+\hat{\delta}\_j'\hat{\delta}\_j}{1+c}I\right]\delta\_j &= O. \end{aligned} \tag{11.5.8}$$

This shows that *δj* is an eigenvector of *R*. If *νj* is an eigenvalue of *R*, then the largest *r* eigenvalues are of the form

$$\nu\_{j} = \frac{1 + \hat{\delta}\_{j}^{\prime}\hat{\delta}\_{j}}{1 + c}, \ j = 1, \ldots, r,\tag{11.5.9}$$

and the remaining ones are *νr*+<sup>1</sup>*,...,νp,* where *c* is as specified in (11.5.5). Thus, the procedure is the following: Compute the eigenvalues *νj , j* = 1*,... , p,* of *R* and determine the corresponding eigenvectors, denoted by *δj , j* = 1*,...,p*. The first *r* of them which correspond to the *r* largest *νj* 's, are *δ*ˆ *<sup>j</sup>* <sup>=</sup> *<sup>Θ</sup>*<sup>ˆ</sup> *<sup>Λ</sup>*<sup>ˆ</sup> *<sup>j</sup>* <sup>⇒</sup> *<sup>Λ</sup>*<sup>ˆ</sup> *<sup>j</sup>* <sup>=</sup> *<sup>Θ</sup>*<sup>ˆ</sup> <sup>−</sup>1*δ*<sup>ˆ</sup> *<sup>j</sup> , j* = 1*,...,r*. Let

$$
\hat{\delta}\_{j} = \begin{bmatrix} \hat{\delta}\_{1j} \\ \hat{\delta}\_{2j} \\ \vdots \\ \hat{\delta}\_{pj} \end{bmatrix}, \ \hat{A}\_{j} = \begin{bmatrix} \hat{\lambda}\_{1j} \\ \hat{\lambda}\_{2j} \\ \vdots \\ \hat{\lambda}\_{pj} \end{bmatrix} \text{ and } \ \hat{\Theta}^{2} = \begin{bmatrix} \frac{n(1+c)}{s\_{11}} & 0 & \dots & 0 \\ 0 & \frac{n(1+c)}{s\_{22}} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \frac{n(1+c)}{s\_{pp}} \end{bmatrix}. \tag{11.5.10}
$$

Then,

$$
\hat{\lambda}\_{ij} = \frac{\sqrt{s\_{jj}}}{\sqrt{n(1+c)}} \hat{\delta}\_{ij}, \; i = 1, \ldots, p, \; j = 1, \ldots, r,\tag{11.5.11}
$$

and *Θ*ˆ is available from (11.5.10). All the model parameters have now been estimated.

#### **11.5.1. The exponent in the likelihood function**

Given the MLE's of the parameters which are available from (11.5.6), (11.5.8), (11.5.10) and (11.5.11), what will be the maximum value of the likelihood function? Let us examine its exponent:

$$\begin{aligned} -\frac{1}{2}\text{tr}(\hat{\Theta}^2 \mathcal{S}) + \frac{n}{2} \sum\_{j=1}^r \hat{\delta}\_j' \hat{\delta}\_j &= -\frac{1}{2}n(1+c)\text{tr}(\mathcal{R}) + \frac{n}{2}pc \\ &= -\frac{1}{2}n(1+c)p + \frac{1}{2}npc = -\frac{np}{2}, \end{aligned}$$

#### Factor Analysis

that is, the same value of the exponent that is obtained under a general *Σ*. The estimates were derived under the assumptions that *Σ* = *Ψ* + *ΛΛ*- *, Λ*- *Ψ* <sup>−</sup>1*Λ* is diagonal with the diagonal elements *δ*- *<sup>j</sup> δj , j* <sup>=</sup> <sup>1</sup>*, . . . , r,* and *<sup>Ψ</sup>* <sup>=</sup> *<sup>Θ</sup>*−2.

**Example 11.5.1.** Using the data set provided in Example 11.4.1, estimate the factor loadings and the diagonal elements of Cov*()* = *Ψ* = diag*(ψ*11*,...,ψpp)*. In this example, *p* = 3*, n* = 6*, r* = 2.

**Solution 11.5.1.** We will adopt the same notations and make use of some of the computational results already obtained in the previous solution. First, we need to compute the eigenvalues of the sample correlation matrix *R*. The sample sum of products matrix *S* is given by

$$S = \begin{bmatrix} 6 & 0 & -2 \\ 0 & 6 & -2 \\ -2 & -2 & 6 \end{bmatrix} \Rightarrow R = \begin{bmatrix} 1 & 0 & -\frac{2}{6} \\ 0 & 1 & -\frac{2}{6} \\ -\frac{2}{6} & -\frac{2}{6} & 1 \end{bmatrix} = \frac{1}{6}S.$$

Hence, the eigenvalues of *R* are <sup>1</sup> <sup>6</sup> times the eigenvalues of *<sup>S</sup>*, that is, *<sup>ν</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>6</sup> *(*<sup>6</sup> <sup>+</sup> <sup>√</sup>8*)* <sup>=</sup> 1 + √2 <sup>3</sup> *, ν*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>6</sup> *(*6*)* = 1 and *ν*<sup>3</sup> = 1 − √2 <sup>3</sup> . Since <sup>1</sup> <sup>6</sup> will be canceled when determining the eigenvectors, the eigenvectors of *S* will coincide with those of *R*. They are the following, denoted again by *δj , j* = 1*,* 2*,* 3:

$$\delta\_1 = \begin{bmatrix} -\frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 1 \end{bmatrix}, \quad \delta\_2 = \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix}, \quad \delta\_3 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ 1 \end{bmatrix}.$$

Therefore, *δ*- <sup>1</sup>*δ*<sup>1</sup> = 2*, δ*- <sup>2</sup>*δ*<sup>2</sup> = 2*, δ*- <sup>3</sup>*δ*<sup>3</sup> <sup>=</sup> 2 and *<sup>c</sup>* as defined in (11.5.5) is *<sup>c</sup>* <sup>=</sup> <sup>1</sup> <sup>3</sup> *(*<sup>2</sup> <sup>+</sup> <sup>2</sup>*)* <sup>=</sup> <sup>4</sup> <sup>3</sup> <sup>⇒</sup> *n(*<sup>1</sup> <sup>+</sup> *c)* <sup>=</sup> <sup>6</sup>*(*<sup>1</sup> <sup>+</sup> <sup>4</sup> <sup>3</sup> *)* = 14. Then, in light of (11.5.10), the estimates of *ψjj , j* = <sup>1</sup>*,* <sup>2</sup>*,* <sup>3</sup>*,* are available as *<sup>ψ</sup>*<sup>ˆ</sup> *jj* <sup>=</sup> *<sup>θ</sup>*ˆ−<sup>2</sup> *<sup>j</sup>* <sup>=</sup> *sjj n(*1+*c)* or *<sup>ψ</sup>*<sup>ˆ</sup> <sup>11</sup> <sup>=</sup> *<sup>θ</sup>*ˆ−<sup>2</sup> <sup>1</sup> <sup>=</sup> <sup>6</sup> <sup>14</sup> <sup>=</sup> <sup>3</sup> <sup>7</sup> = *ψ*ˆ <sup>22</sup> = *ψ*ˆ 33, denoting the estimates with a hat. Therefore, the diagonal matrix *<sup>Ψ</sup>*<sup>ˆ</sup> <sup>=</sup> diag*(* <sup>3</sup> <sup>7</sup> *,* <sup>3</sup> <sup>7</sup> *,* <sup>3</sup> 7 *)*. Hence, the matrix *<sup>Θ</sup>*−<sup>1</sup> <sup>=</sup> *<sup>Ψ</sup>* 1 <sup>2</sup> is estimated by *<sup>Θ</sup>*<sup>ˆ</sup> <sup>−</sup><sup>1</sup> <sup>=</sup> diag*(* √ √ 3 7 *,* √ √ 3 7 *,* √ √ 3 7 *)*. From (11.5.11), *<sup>Λ</sup>*<sup>ˆ</sup> *<sup>j</sup>* <sup>=</sup> *<sup>Θ</sup>*<sup>ˆ</sup> <sup>−</sup><sup>1</sup>*δj* <sup>=</sup> diag*(* √ √ 3 7 *,* √ √ 3 7 *,* √ √ 3 7 *) δj* , that is, *δj* is pre-multiplied by √ √ 3 7 . Therefore, the estimates of the factor loadings are: *λ*ˆ <sup>11</sup> = −*(* √ √ 3 7 *)(*√ 1 2 *)* <sup>=</sup> *<sup>λ</sup>*<sup>ˆ</sup> <sup>21</sup> = −*λ*<sup>ˆ</sup> <sup>13</sup> = −*λ*<sup>ˆ</sup> <sup>23</sup>*, <sup>λ</sup>*<sup>ˆ</sup> <sup>31</sup> <sup>=</sup> <sup>√</sup> √ 3 <sup>7</sup> <sup>=</sup> *<sup>λ</sup>*<sup>ˆ</sup> <sup>33</sup> <sup>=</sup> *<sup>λ</sup>*<sup>ˆ</sup> <sup>12</sup> = −*λ*<sup>ˆ</sup> 22, and *<sup>λ</sup>*<sup>ˆ</sup> <sup>32</sup> <sup>=</sup> 0.

#### **11.6. Tests of Hypotheses**

The usual test in connection with the current topic consists of assessing identifiability, that is, testing the hypothesis *H*<sup>0</sup> that the population covariance matrix *Σ>O* can be represented as *Σ* = *ΛΦΛ*- + *Ψ* when *Φ* = *I* , *Λ*- *Ψ* <sup>−</sup>1*Λ* is a diagonal matrix with positive diagonal elements, *Ψ >O* is a diagonal matrix and *Λ* = *(λij )* is a *p* × *r, r* ≤ *p,* matrix of full rank *r*, whose elements are the factor loadings. That is,

$$H\_0: \ \Sigma = \Lambda \Lambda' + \Psi. \tag{11.6.1}$$

In this instance, a crucial aspect of the hypothesis *Ho* consisting of determining whether "the model fits", is that the number *r* be designated since the other quantities *n*, the sample size, and *p*, the order of the observation vector, are preassigned. Thus, the phrase "model fits" means that for a given *r*, *Σ* can be expressed in the form *Σ* = *Ψ* + *ΛΛ*- *,* in addition to satisfying the identification conditions. The assumed model has the representation: *X* = *M* + *ΛF* + where *X*- = *(x*1*,...,xp)* stands for the *p* × 1 vector of observed scores on *p* tests or *p* batteries of tests, *M* is a *p* × 1 vector of general effect, *F* is an *r* × 1 vector of unknown factors, *Λ* = *(λij )* is the unknown *p* × *r* matrix of factor loadings and is the *p* × 1 error vector. When and *F* are uncorrelated, the covariance matrix of *X* is given by

$$
\Sigma = A\Phi\Lambda' + \Psi
$$

where *Φ* = Cov*(F ) > O* and *Ψ* = Cov*() > O* with *Φ* being *r* × *r* and *Ψ* being *p* × *p* and diagonal. A simple random sample from *X* will be taken to mean a sample of independently and identically distributed (iid) *p* × 1 vectors *X*- *<sup>j</sup>* = *(x*1*<sup>j</sup> , x*2*<sup>j</sup> ,...,xpj ), j* = 1*, . . . , n,* with *n* denoting the sample size. The sample sum of products matrix or "corrected" sample sum of squares and cross products matrix is *<sup>S</sup>* <sup>=</sup> *(sij ), sij* <sup>=</sup> *<sup>n</sup> <sup>k</sup>*=<sup>1</sup>*(xik* <sup>−</sup> *x*¯*i)(xjk* − ¯*xj )*, where, for example, the average of the *xi*'s comprising the *i*-th row of **<sup>X</sup>** = [*X*1*,...,Xn*], namely, *<sup>x</sup>*¯*i*, is *<sup>x</sup>*¯*<sup>i</sup>* <sup>=</sup> *<sup>n</sup> <sup>k</sup>*=<sup>1</sup> *xik/n*. If and *<sup>F</sup>* are independently normally distributed, then the likelihood ratio criterion or *λ*-criterion is

$$\lambda = \frac{\max\_{H\_0} L}{\max L} = \frac{|\hat{\Sigma}|^{\frac{\pi}{2}}}{|\hat{A}\hat{A}' + \hat{\Psi}|^{\frac{\pi}{2}}} \implies w = \lambda^{\frac{2}{\pi}} = \frac{|\hat{\Sigma}|}{|\hat{A}\hat{A}' + \hat{\Psi}|} \tag{11.6.2}$$

where *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>S* and the covariance matrix *Σ* = *ΛΛ*- + *Ψ* under *H*0, with *Φ* = Cov*(F )* assumed to be an identity matrix and the *r* × *r* matrix *Λ*- *<sup>Ψ</sup>* <sup>−</sup>1*<sup>Λ</sup>* <sup>=</sup> diag*(δ*- 1*δ*1*,...,δ*- *rδr)* having positive diagonal elements *δ*- *<sup>j</sup> δj , j* = 1*,...,r*. Referring to Sect. 11.4.2, we have

$$|\Lambda\Lambda'+\Psi|=|\Psi|\,|\Lambda'\Psi^{-1}\Lambda+I|\,\tag{11.6.3}$$

Factor Analysis

and 1 + *δ*- *<sup>j</sup> δj* = 1 + *Λ*- *<sup>j</sup><sup>Ψ</sup>* <sup>−</sup><sup>1</sup>*Λj* <sup>=</sup> <sup>1</sup> <sup>+</sup> *<sup>Λ</sup>*- *<sup>j</sup>Θ*<sup>2</sup>*Λj* where *δj* <sup>=</sup> *<sup>Ψ</sup>* <sup>−</sup><sup>1</sup> <sup>2</sup>*Λj* = *ΘΛj* and *Λj* is the *j* -th column of *Λ* for *j* = 1*,...,r*. It was shown in (11.5.8) that *δj* is an eigenvector of the sample correlation matrix *R* and

$$\prod\_{j=1}^r (1 + \delta\_j' \delta\_j) = |\Lambda' \Psi^{-1} \Lambda + I|.$$

However, in view of the discussion following (11.5.8), an eigenvalue of *R* is of the form *νj* <sup>=</sup> <sup>1</sup>+*δ*- *j δj (*1+*c) , j* <sup>=</sup> <sup>1</sup>*,...,r*. Let *<sup>ν</sup>*1*,...νp* be the eigenvalues of *<sup>R</sup>* and let the largest *<sup>r</sup>* of them be *<sup>ν</sup>*1*,...,νr*. It also follows from (11.5.8) that *<sup>Θ</sup>*<sup>ˆ</sup> <sup>2</sup> <sup>=</sup> *n(*<sup>1</sup> <sup>+</sup> *c)*diag*(* <sup>1</sup> *<sup>s</sup>*<sup>11</sup> *,...,* <sup>1</sup> *spp )* with *<sup>Θ</sup>* <sup>=</sup> *<sup>Ψ</sup>* <sup>−</sup><sup>1</sup> <sup>2</sup> . Thus,

$$\begin{split} \left| \frac{\hat{|\mathcal{D}|}}{\hat{|\Psi}} \right| &= \left| \hat{\Theta} \left( \frac{S}{n} \right) \hat{\Theta} \right| = \left| (1+c)\mathcal{R} \right| = \left\{ \prod\_{j=1}^{r} (1+c)\upsilon\_{j} \right\} (1+0)^{p-r} \upsilon\_{r+1} \cdots \upsilon\_{p} \\ &= \left\{ \prod\_{j=1}^{r} (1+\hat{\delta}\_{j}^{r} \hat{\delta}\_{j}) \right\} \upsilon\_{r+1} \cdots \upsilon\_{p} \\ &\Rightarrow \frac{|\frac{1}{n} S|}{|\hat{\Psi}| \, |\hat{A}^{\prime} \hat{\Psi}^{-1} \hat{A} + I|} = \frac{\{\prod\_{j=1}^{r} (1+\hat{\delta}\_{j}^{r} \hat{\delta}\_{j}) \} \upsilon\_{r+1} \cdots \upsilon\_{p}}{\{\prod\_{j=1}^{r} (1+\hat{\delta}\_{j}^{r} \hat{\delta}\_{j})\}} \\ &= \upsilon\_{r+1} \cdots \upsilon\_{p} = w = \lambda^{\frac{2}{n}}. \end{split} \tag{11.6.4}$$

Hence, we reject the null hypothesis for small values of the product *νr*+<sup>1</sup> ··· *νp,* that is, the product of the smallest *p* − *r* eigenvalues of the sample correlation matrix *R*. In order to evaluate critical points, one would require the null distribution of the product of the eigenvalues, *νr*+<sup>1</sup> ··· *νp*, which is difficult to determine for a general *p*. How can rejecting the null hypothesis that the "model fits" be interpreted? Since, in the whole structure, the decisive quantity is *r*, we are actually rejecting the hypothesis that a given *r* is the number of main factors contributing to the observations. Hence, we may seek a larger or smaller *r*, keeping the structure unchanged and testing the same hypothesis again until the hypothesis is not rejected. We may then assume that the *r* specified at the last stage is the number of main factors contributing to the observation or we may assert that, with that particular *r*, there is evidence that the model fits.

We will now determine conditions ensuring that the likelihood ratio criterion *λ* be less than or equal to one. While, assuming that *Λ*- *Ψ* <sup>−</sup>1*Λ* is diagonal, the left-hand side of the deciding equation, *Σ* = *ΛΛ*- +*Ψ*, has *p(p*+1*)/*2 parameters, there are *p r*+*p*−*r(r*−1*)/*2

conditions on the right-hand side where *r(r* − 1*)/*2 arises from the diagonality condition. The difference is then

$$\frac{p(p+1)}{2} - \left[p\,r + p - \frac{r(r-1)}{2}\right] = \frac{1}{2}[(p-r)^2 - (p+r)] \equiv \rho. \tag{11.6.5}$$

This *ρ* depends upon the parameters *p* and *r*, whereas *λ* depends upon *p, r* and *c*. Thus, *λ* may not be ≤ 1. In order to make *λ* ≤ 1, we can make *c* close to 0 by multiplying the *δ*ˆ *<sup>j</sup>* 's by a constant, observing that this is always possible because the *δ*ˆ *<sup>j</sup>* 's are the eigenvectors of *R*. By selecting a constant *m* and taking the new *δ*ˆ *<sup>j</sup>* as <sup>√</sup> 1 *<sup>m</sup>δj* , *c* can be made close to 0 and *λ* will be ≤ 1*,* so that rejecting the null hypothesis for small values of *λ* will make sense. It may so happen that there will not be any parameter left to be restricted by the hypothesis *Ho* that "model fits". The quantity *ρ* appearing in (11.6.5) could then be ≤ 0, and in such an instance, the hypothesis would not make sense and could not be tested.

The density of the sample correlation matrix *R* is provided in Example 1.25 of Mathai (1997, p. 58). Denoting this density by *f (R)*, it is the following for the population covariance matrix *Σ* in a parent *Np(μ, Σ)* population with *Σ* being a positive definite diagonal matrix, as was assumed in Sect. 11.6:

$$f(R) = \frac{[\varGamma(\frac{m}{2})]^p}{\varGamma\_p(\frac{m}{2})} |R|^{\frac{m}{2} - \frac{p+1}{2}}, \ R > O, \ m = n - 1, \ n > p,\tag{11.6.6}$$

and zero elsewhere, where *n* is the sample size.

#### **11.6.1. Asymptotic distribution of the likelihood ratio statistic**

For a large sample size *n*, −2 ln *λ* is approximately distributed as a chisquare random variable having *k* degrees of freedom where *λ* is the likelihood ratio criterion and *k* is the number of parameters restricted by the hypothesis *H*0. This approximation holds whenever the sample size *n* is large and *k* ≥ 1. With *ρ* as defined in (11.6.5), we have

$$k = \rho = \frac{1}{2}[(p - r)^2 - (p + r)].\tag{11.6.7}$$

However, *p*−*r* = 1 and *p*+*r* = 5 in the illustrative example, so that *k* = −2. Accordingly, even if the sample size *n* were large, this asymptotic result would not be applicable.

#### **11.6.2. How to decide on the number** *r* **of main factors?**

The structure of the population covariance matrix *Σ* under the model *Xj* = *M* +*ΛF* + *Ej , j* = 1*, . . . , n,* is

$$
\Sigma = \Lambda \Phi \Lambda' + \Psi \implies \Sigma = \Lambda \Lambda' + \Psi \text{ for } \Phi = I,\tag{11.6.8}
$$

#### Factor Analysis

where it is assumed that *Ej* and *F* are uncorrelated, *Σ* = Cov*(Xj )>O* is *p* × *p*, *Φ* = Cov*(F )* = *I,* the *r* × *r* identity matrix, *Ψ* = Cov*(Ej )* is a *p* × *p* diagonal matrix and *Λ* = *(λij )* is a full rank *p* × *r, r* ≤ *p*, matrix whose elements are the factor loadings. Under the orthogonal factor model, *Φ* = *I* . Moreover, to ensure the identification of the model, we assume that *Λ*- *Ψ* <sup>−</sup>1*Λ* is a diagonal matrix. Before initiating any data analysis, we have to assign a value to *r* on the basis of the data set at hand in order to set up the model. Thus, the matter of initially setting the number of main factors has to be addressed. Given

$$R = \Lambda A' + \Psi \tag{11.6.9}$$

where *Λ* = *(λij )* is a *p* × *r* matrix and *Ψ* is a *p* × *p* diagonal matrix, does a solution that is expressible in terms of the elements of *R* (or those of *S* if *S* is used), exist for all *λij* 's and *ψjj* 's? In general

$$R = \lambda\_1 U\_1 U\_1' + \dots + \lambda\_p U\_p U\_p' \tag{11.6.10}$$

where the *λj* 's are the eigenvalues of *R* and the *Uj* 's are the corresponding normalized eigenvectors. Observe that *UjU*- *<sup>j</sup>* is *p* × *p* whereas *U*- *<sup>j</sup>Uj* = 1*, j* = 1*,...,p*. If *r* = *p,* then a solution always exists for (11.6.9). When taking *Ψ* = *O*, we can always let *R* = *BB* for some *p* × *p* matrix *B*, which can be achieved for example via a triangular decomposition. Accordingly, the relevant aspects are *r<p* and the diagonal elements in *Ψ*, namely, the *ψjj* 's being positive. Can we then solve for all the *λij* 's and *ψjj* 's involved in (11.6.9) in terms of the elements in *R*? The answer is that a solution exists, but only when certain conditions are satisfied. Our objective is to select a value of *r* that is as small as possible and then, to obtain a solution to (11.6.9) in terms of the elements in *R*.

The analysis is to be carried out by utilizing either the sample sum of products matrix *S* or the sample correlation matrix *R*. The following are some of the guidelines for selecting *r* in order to set up the model.

(i): Compute all the eigenvalues of *R* (or *S*). Let *r* be the number of eigenvalues ≥ 1 if the sample correlation matrix *R* is used. If *S* is used, then determine all the eigenvalues, calculate the average of these eigenvalues, and count the number of eigenvalues that are greater than or equal to this average. Take that number to be *r*.

(2): Carry out a Principal Component Analysis on *R* (or *S*). If *S* is used, ensure that the units of measurements are not creating discrepancies. Compute the variances of these Principal Components, which are the eigenvalues of *R* (or *S*). Let *λj , j* = 1*,... , p,* denote these eigenvalues. Compute the ratios

$$\frac{\lambda\_1 + \dots + \lambda\_m}{\lambda\_1 + \dots + \lambda\_p}, \ m = 1, 2, \dots, \ l$$

and stop with that *m* for which the desired fraction of the total variation in the data is accounted for. Take that *m* as *r*. When implementing the principal component approach, the factor loadings *λij* 's and the *ψjj* 's can be estimated as follows: From (11.6.10), write

$$R = A + B \text{ with } A = \sum\_{j=1}^{r} \lambda\_j U\_j U\_j' \text{ and } B = \sum\_{j=r+1}^{p} \lambda\_j U\_j U\_j',$$

where *A* can be expressed as *V V* with *V* = [*λjU*1*,..., λjUr*]. Then, *V* is taken as an approximate estimate of *Λ* or as *Λ*ˆ. Observe that *λj >* 0*, j* = 1*,...,p*. The sum over *j* of the *i*-th diagonal elements of *λjUjU*- *<sup>j</sup> , j* = *r* +1*,... , p,* will provide an estimate for *ψii, i* = 1*,...,p*. These estimates can also be obtained as follows: Consider the estimate of *σii* denoted by *σ*ˆ*ii* which is equal to the sum of all the *i*-th diagonal elements in *A* + *B*; it will be 1 if *<sup>R</sup>* is used and *<sup>σ</sup>*ˆ*ii* if *<sup>S</sup>* is utilized in the analysis; then, *<sup>ψ</sup>*ˆ*ii* = ˆ*σii* <sup>−</sup> *<sup>r</sup> <sup>j</sup>*=<sup>1</sup> *<sup>λ</sup>*ˆ*ij* and *ψ*ˆ*ii* is now the sum of the *i*-th diagonal elements in *B*.

(iii): Consider the individual correlations in the sample correlation matrix *R*. Identify the largest ones in absolute value. If the largest ones occur at the (1,3)-th and (2,3)-th positions, then the factor *f*<sup>3</sup> will be deemed influential. Start with *r* = 1 (factor *f*3) and carry out the analysis. Then, assess the proportion of the total variation accounted for by *σ*ˆ33. Should the proportion not be satisfactory, we may continue with *r* = 2. If the (2,3)-th position value is larger in absolute value than the value at the (1,3)-th position, then *f*<sup>2</sup> may be the next significant factor. Compute *σ*ˆ<sup>33</sup> + ˆ*σ*<sup>22</sup> and determine the proportion to the total variation. If the resulting model is rejected, then take *r* = 3, and continue in this fashion until an acceptable proportion of the total variation is accounted for.

(iv): The maximum likelihood method. With this approach, we begin with a preselected *r* and test the hypothesis that, when comprising *r* factors, the model fits. If the hypothesis is rejected, then we let number of influential factors be *r* −1 or *r* +1 and continue the process of testing and deciding until the hypothesis is not rejected. That final *r* is to be taken as the number of main factors contributing towards the observations. The initial value of *r* may be determined by employing one of the methods described in *(i)* or *(ii)* or *(iii)*.

#### **Exercises**

**11.1.** For the following data, where the 6 columns of the matrix represent the 6 observation vectors, verify whether *r* = 2 provides a good fit to the data. The proposed model is the Factor Analysis model *X* = *M* + *ΛF* + *, F*- = *(f*1*, f*2*), Λ* is 3 × 2 and of rank 2, Cov*()* = *Ψ* is diagonal, Cov*(F )* = *Φ* = *I,* and Cov*(X)* = *Σ>O*. The data set is


**11.2.** For the model *X* = *M* + *ΛF* + with the conditions as specified in Exercise 11.1, verify whether the model with *r* = 2 or *r* = 3 gives a good fit on the basis of the following data, where the columns in the matrix represent five observation vectors:


**11.3.** Do a Principal Component Analysis in Exercise 11.1 to assess what percentage of the total variation in the data is accounted for by *r* = 2.

**11.4.** Do a Principal Component Analysis in Exercise 11.2 to determine what percentages of the total variation in the data are accounted for by *r* = 2 and *r* = 3.

**11.5.** Even though the sample sizes are not large, perform tests based on the asymptotic chisquare to assess whether the two tests there agree with the findings in Exercises 11.1 and 11.2.

**11.6.** Four model identification conditions are stated at the end of Sect 11.3.1. Develop *λ*-criteria under the conditions stated in *(i)*: case (2); *(ii)*: case (3), selecting your own *B*1; *(iii)*: case (4).

#### **References**

E. S. Allman, C. Matias and J. A. Rhodes (2009): Identifiability of parameters in latent structure models with many observed variables, The Annals of Statistics, **37**, 3099–3132

T. W. Anderson (2003): *An Introduction to Multivariate Statistical Analysis*, 3rd ed., Wiley-Interscience, New Jersey.

T. W. Anderson and H. Rubin (1956): Statistical inference in factor analysis, Proc. 3rd Berkeley Symposium on Mathematical Statistics and Probability, Vol. **5**, 111–150.

Y. Chen, X. Li, and S. Zhang (2020): Structured latent factor analysis for large-scale data: identifiability, estimability, and their implications, Journal of the American Statistical Association, **115**, 1756–1770.

M. Ihara and Y. Kano (1995): Identifiability of full, marginal, and conditional factor analysis models, Statistics & Probability Letters, **23**, 343–350.

A. M. Mathai (1997): *Jacobians of Matrix Transformations and Functions of Matrix Argument*, World Scientific Publishing, New York.

A.M. Mathai (2021): Factor analysis revisited, *C.R. Rao, 100th Birth Anniversary Volume, Statistical Theory and Practice*, https://doi.org/10.1007/s42519-020-00160-1

L. L. Wegge (1996): Local identifiability of the factor analysis and measurement error model parameter, Journal of Econometrics, **70**, 351–382.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 12 Classification Problems**

#### **12.1. Introduction**

We will use the same notations as in the previous chapters. Lower-case letters *x, y,...* will denote real scalar variables, whether mathematical or random. Capital letters *X, Y, . . .* will be used to denote real matrix-variate mathematical or random variables, whether square or rectangular matrices are involved. A tilde will be placed on top of letters such as *x,*˜ *y,*˜ *X,* ˜ *Y*˜ to denote variables in the complex domain. Constant matrices will for instance be denoted by *A, B, C*. A tilde will not be used on constant matrices unless the point is to be stressed that the matrix is in the complex domain. The determinant of a square matrix *A* will be denoted by |*A*| or det*(A)* and, in the complex case, the absolute value or modulus of the determinant of *A* will be denoted as|det*(A)*|. When matrices are square, their order will be taken as *p* ×*p*, unless specified otherwise. When *A* is a full rank matrix in the complex domain, then *AA*∗ is Hermitian positive definite where an asterisk designates the complex conjugate transpose of a matrix. Additionally, d*X* will indicate the wedge product of all the distinct differentials of the elements of the matrix *X*. Thus, letting the *p* × *q* matrix *<sup>X</sup>* <sup>=</sup> *(xij )* where the *xij* 's are distinct real scalar variables, d*<sup>X</sup>* = ∧*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> <sup>∧</sup>*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>d</sup>*xij* . For the complex matrix *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup>*iX*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, where *<sup>X</sup>*<sup>1</sup> and *<sup>X</sup>*<sup>2</sup> are real, d*X*˜ <sup>=</sup> <sup>d</sup>*X*<sup>1</sup> <sup>∧</sup>d*X*2.

Historically, classification problems arose in anthropological studies. By taking a set of measurements on skeletal remains, anthropologists wanted to classify them as belonging to a certain racial group such as being of African or European origin. The measurements might have been of the following type: *x*<sup>1</sup> = width of the skull, *x*<sup>2</sup> = volume of the skull, *x*<sup>3</sup> = length of the thigh bone, *x*<sup>4</sup> = width of the pelvis, and so on. Let the measurements be represented by a *p* × 1 vector *X*, with *X*- = *(x*1*,...,xp)* where a prime denotes the transpose. Nowadays, classification procedures are employed in all types of problems occurring in various contexts. For example, consider the situation of a battery of tests in an entrance examination to admit students into a professional program such as medical sciences, law studies, engineering science or management studies. Based on the *p* × 1

© The Author(s) 2022

A. M. Mathai et al., *Multivariate Statistical Analysis in the Real and Complex Domains*, https://doi.org/10.1007/978-3-030-95864-0 12

vector of test scores, a statistician would like to classify an applicant as to whether or not he/she belongs to the group of applicants who will successfully complete a given program. This is a 2-group situation. If a third category is added such as those who are expected to complete the program with flying colors, this will become a 3-group situation. In general, one will have a *k*-group situation when an individual is classified into one of *k* classes.

Let us begin with the 2-group situation. The problem consists of classifying the *p* × 1 vector *X* into one of two, groups, classes or categories. Let the categories be denoted by population *π*<sup>1</sup> and population *π*2. This means *X* will either belong to *π*<sup>1</sup> or to *π*2, no other options being considered. The *p* × 1 vector *X* may be taken as a point in a *p*-space *Rp* or *<sup>p</sup>*-dimensional Euclidean space *p*. In a two-group situation when it is decided that the candidate either belongs to the population *π*<sup>1</sup> or the population *π*2, two subspaces *A*<sup>1</sup> and *A*<sup>2</sup> within the *p*-space *Rp* are determined: *A*<sup>1</sup> ⊂ *Rp* and *A*<sup>2</sup> ⊂ *Rp,* with *A*1∩*A*<sup>2</sup> = *O* (the empty set) or a decision rule can be symbolically written as *A* = *(A*1*, A*2*)*. If *X* falls in *A*1, the candidate is classified into *π*<sup>1</sup> and if *X* falls in *A*2, then the candidate is classified into *π*2. In other words, *X* ∈ *A*<sup>1</sup> means the individual is classified into population *π*<sup>1</sup> and *X* ∈ *A*<sup>2</sup> means that the individual is classified into population *π*2. The regions *A*<sup>1</sup> and *A*<sup>2</sup> or the rule *A* = *(A*1*, A*2*)* are not known beforehand. These are to be determined by employing certain decision rules. Criteria for determining *A*<sup>1</sup> and *A*<sup>2</sup> will be subsequently put forward. Let us now consider the consequences. When a decision is made to classify *X* as coming from *π*1, either the decision is correct or the decision is erroneous. If the population is actually *π*<sup>1</sup> and the decision rule classifies *X* into *π*1, then the decision is correct. If *X* is classified into *π*<sup>2</sup> when in reality the population is *π*1, then a mistake has been committed or a misclassification occurred. Misclassification will involve penalties, costs or losses. Let such a penalty, cost or loss of classifying an individual into group *i* when he/she actually belongs to group *j,* be denoted by *C(i*|*j )*. In a 2-group situation, *i* and *j* can only equal 1 or 2. That is, *C(*1|2*) >* 0 and *C(*2|1*) >* 0 are the costs of misclassifying, whereas *C(*1|1*)* = 0 and *C(*2|2*)* = 0 since there is no cost or penalty associated with correct decisions. The following table summarizes this discussion:


**Table 12.1:** Cost of misclassification *C(i*|*j )*

#### **12.2. Probabilities of Classification**

The vector random variable corresponding to the observation vector *X* may have its own probability/density function. The real scalar variables as well as the observations on them will be denoted by the lower-case letters *x*1*,...,xp*. When dealing with the probability/density function of *X*, *X* is taken as vector random variable, whereas when looked upon as a point in the *p*-space, *Rp*, *X* is deemed to be an observation vector. The *p* × 1 vector *X* may have a probability/density function *P (X)*. In a 2-group or two classes situation, *P (X)* is either *P*1*(X),* the population density of *π*<sup>1</sup> or *P*2*(X),* the population density of *π*2. For convenience, it will be assumed that *X* of the continuous type, the derivations in the discrete case being analogous. In the 2-group situation, *P (X)* can only be *P*1*(X)* or *P*2*(X)*. What is then the probability of achieving a correct classification under the rule *A* = *(A*1*, A*2*)*? If the sample point *X* falls in *A*1, we classify the candidate as belonging to *π*1*,* and if the true population is also *π*1, then a correct decision is made. In that instance, the corresponding probability is

$$\Pr\{1|1, A\} = \int\_{A\_1} P\_1(X)dX \tag{12.2.1}$$

where d*X* = d*x*<sup>1</sup> ∧ d*x*<sup>2</sup> ∧ *...* ∧ d*xp*, *A* = *(A*1*, A*2*)* denoting one decision rule or one given set of subspaces of the *p*-space *Rp*. The probability of misclassification in this case is

$$\Pr\{2|1,A\} = \int\_{A\_2} P\_1(X)dX. \tag{12.2.2}$$

Similarly, the probabilities of correctly selecting and misclassifying *P*2*(X)* are respectively given by

$$\Pr\{2|2,A\} = \int\_{A\_2} P\_2(X)dX\tag{12.2.3}$$

and

$$\Pr\{1|2, A\} = \int\_{A\_1} P\_2(X)dX. \tag{12.2.4}$$

In a Bayesian setting, there is a prior probability *q*<sup>1</sup> of selecting the population *π*<sup>1</sup> and *q*<sup>2</sup> of selecting the population *π*2, with *q*1+*q*<sup>2</sup> = 1. Then, what will be the probability of drawing an observation from *π*<sup>1</sup> and misclassifying it as belonging to *π*2? It is *q*<sup>1</sup> × *P r*{2|1*, A*} = *q*1 # *A*2 *P*1*(X)*d*X* and, similarly, the probability of drawing an observation from *π*<sup>2</sup> and misclassifying it as coming from *π*<sup>1</sup> is *q*2×*P r*{1|2*, A*} = *q*<sup>2</sup> # *A*1 *P*2*(X)*, with the respective costs of misclassifications being *C(*2|1*)* = *C(*2|1*, A)* and *C(*1|2*)* = *C(*1|2*, A).* What is then the expected cost of misclassification? It is the sum of the costs multiplied by the corresponding probabilities. Thus,

$$\text{the expected cost } = q\_1 C(2|1)Pr\{2|1, A\} + q\_2 C(1|2)Pr\{1|2, A\}. \tag{12.2.5}$$

So, an advantageous criterion to rely on, when setting up *A*<sup>1</sup> and *A*<sup>2</sup> would consist in minimizing the expected cost as given in (12.2.5). A rule could be devised for determining *A*<sup>1</sup> and *A*<sup>2</sup> accordingly. In this regard, this actually corresponds to Bayes' rule. How can one interpret this expected cost? For example, in the case of admitting students to a particular program of study based on a vector *X* of test scores, it is the cost of admitting potentially incompetent students or students who would not have successfully completed the program of study and training them, plus the projected cost of losing good students who would have successfully completed the program of study.

If prior probabilities *q*<sup>1</sup> and *q*<sup>2</sup> are not involved, then the expected cost of misclassifying an observation from *π*<sup>1</sup> as coming from *π*<sup>2</sup> is

$$C(\mathcal{Q}|1)Pr\{\mathcal{Q}|1,A\} \equiv E\_1(A),\tag{12.2.6}$$

and the expected cost of misclassifying an observation from *π*<sup>2</sup> as coming from *π*<sup>1</sup> is

$$C(1|2)Pr\{1|2,A\} \equiv E\_2(A). \tag{12.2.7}$$

We would like to have *E*1*(A)* and *E*2*(A)* as small as possible. In this case, a procedure, rule or criterion *A* = *(A*1*, A*2*)* corresponds to determining suitable subspaces *A*<sup>1</sup> and *A*<sup>2</sup> in the *<sup>p</sup>*-space *Rp*. If there is another procedure *<sup>A</sup>(j )* <sup>=</sup> *(A(j )* <sup>1</sup> *, A(j )* <sup>2</sup> *)* such that *<sup>E</sup>*1*(A)* <sup>≤</sup> *<sup>E</sup>*1*(A(j ))* and *<sup>E</sup>*2*(A)* <sup>≤</sup> *<sup>E</sup>*2*(A(j )),* then procedure *<sup>A</sup>* is said to be as good as *<sup>A</sup>(j )*, and if at least one of the inequalities above is a strict inequality, that is *<* , then *A* is preferable to *A(j )*. If procedure *<sup>A</sup>* is preferable to all other available procedures *<sup>A</sup>(j ), j* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,...*, *<sup>A</sup>* is said to be *admissible*. We are seeking an admissible class {*A*} of procedures.

#### **12.3. Two Populations with Known Distributions**

Let *π*<sup>1</sup> and *π*<sup>2</sup> be the two populations. Let *P*1*(X)* and *P*2*(X)* be the known *p*-variate probability/density functions associated with *π*<sup>1</sup> and *π*2, respectively. That is, *P*1*(X)* and *P*2*(X)* are two *p*-variate probability/density functions which are fully known in the sense that all their parameters are known in addition to their functional forms. Consider the Bayesian situation where it is assumed that the prior probabilities *q*<sup>1</sup> and *q*<sup>2</sup> of selecting *π*<sup>1</sup> and *π*2, respectively, are known. Suppose that a particular *p*-vector *X* is at hand. What is the probability that this given *X* is an observation from *π*1? This probability is *q*<sup>1</sup> *P*1*(X)* if *X* is discrete or *q*1*P*1*(X)*d*X* if *X* is continuous. What is the probability that the given vector *X* is an observation vector either from *π*<sup>1</sup> or from *π*2? This probability is *q*1*P*1*(X)*+ *q*2*P*2*(X)* or [*q*1*P*1*(X)*+*q*2*P*2*(X)*]d*X*. What is then the probability that the vector *X* at hand is from *P*1*(X)*, given that it is an observation vector from *π*<sup>1</sup> or *π*2? As this is a conditional statement, it is given by the following in the discrete or continuous case:

$$\begin{split} \frac{q\_1 P\_1(X)}{q\_1 P\_1(X) + q\_2 P\_2(X)} &\quad \text{or} \quad \frac{q\_1 P\_1(X) \text{dX}}{[q\_1 P\_1(X) + q\_2 P\_2(X)] \text{dX}}\\ &= \frac{q\_1 P\_1(X)}{q\_1 P\_1(X) + q\_2 P\_2(X)} \end{split} \tag{12.3.1}$$

where d*X*, which is the wedge product of differentials and positive in this case, cancels out. If the conditional probability that a given *X* is an observation from *π*<sup>1</sup> is larger than or equal to the conditional probability that the given vector *X* is an observation from *π*<sup>2</sup> and if we assign *X* to *π*1, then the chance of misclassification is reduced. Our main objective is to minimize the probability of misclassification and then come up with a decision rule. This statement is equivalent to the following: If

$$\frac{q\_1 P\_1(X)}{q\_1 P\_1(X) + q\_1 P\_2(X)} \ge \frac{q\_2 P\_2(X)}{q\_1 P\_1(X) + q\_2 P\_2(X)} \Rightarrow q\_1 P\_1(X) \ge q\_2 P\_2(X) \tag{12.3.2}$$

then we assign *X* to *π*1, meaning that our subspace *A*<sup>1</sup> is specified by the following rule:

$$A\_1: q\_1 P\_1(X) \ge q\_2 P\_2(X) \Rightarrow \frac{P\_1(X)}{P\_2(X)} \ge \frac{q\_2}{q\_1}$$

$$A\_2: q\_1 P\_1(X) < q\_2 P\_2(X) \Rightarrow \frac{P\_1(X)}{P\_2(X)} < \frac{q\_2}{q\_1}.\tag{12.3.3}$$

Note that if *q*1*P*1*(X)* = *q*2*P*2*X)*, then *X* can be assigned to either *π*<sup>1</sup> or *π*2; however, we have assigned it to *π*<sup>1</sup> for convenience. Observe that, it is assumed that *q*1*P*1*(X)* + *q*2*P*2*(X)* = 0, *q*<sup>1</sup> *>* 0*, q*<sup>2</sup> *>* 0 and *q*<sup>1</sup> + *q*<sup>2</sup> = 1 in (12.3.2). The conditional statement made in (12.3.2), which can also be written as

$$\frac{q\_i P\_i(X)}{q\_1 P\_1(X) + q\_2 P\_2(X)} = \frac{\eta\_i P\_i(X)}{\eta\_1 P\_1(X) + \eta\_2 P\_2(X)}, \ \eta\_i > 0, \ \eta\_1 + \eta\_2 = \eta > 0, \ \frac{\eta\_i}{\eta} = q\_i, \ i = 1, 2, 3$$

holds for some weight functions *ηi, i* = 1*,* 2.

If the observation is from *π*<sup>1</sup> : *P*1*(X),* then the expected cost of misclassification is *q*1*P*1*(X)C(*2|1*)* + *q*2*P*2*(X)C(*2|2*)* = *q*1*P*1*(X)C(*2|1*)* since *C(i*|*i)* = 0*, i* = 1*,* 2*.* Similarly, the expected cost of misclassifying of the observation *X* from *π*<sup>2</sup> : *P*2*(X)* is *q*2*P*2*(X)C(*1|2*)*. If *P*1*(X)* is our preferred distribution, then we would like the associated expected cost of misclassification to be the lesser one, that is,

$$q\_1 P\_1(X)C(2|1) < q\_2 P\_2(X)C(1|2) \text{ in } A\_2 \Rightarrow$$

$$\begin{array}{l} P\_1(X) \\ \hline P\_2(X) \\ P\_1(X) \\ \hline P\_2(X) \end{array} < \frac{q\_2 C(1|2)}{q\_1 C(2|1)} \text{ in } A\_2 \text{ or } \\ \frac{P\_1(X)}{P\_2(X)} \ge \frac{q\_2 C(1|2)}{q\_1 C(2|1)} \text{ in } A\_1,\tag{12.3.4}$$

which is the same rule as in (12.3.3) where *q*<sup>1</sup> is replaced by *q*1*C(*2|1*)* and *q*2, by *q*2*C(*1|2*)*.

#### **12.3.1. Best procedure**

It can be established that the procedure *A* = *(A*1*, A*2*)* in (12.3.3) is the best one for minimizing the probability of misclassification. To this end, consider any other procedure *<sup>A</sup>(j )* <sup>=</sup> *(A(j )* <sup>1</sup> *, A(j )* <sup>2</sup> *), j* = 1*,* 2*,....* The probability of misclassification under the procedure *A(j )* is the following:

$$\begin{split} &q\_1 \int\_{A\_2^{(j)}} P\_1(X) \mathrm{d}X + q\_2 \int\_{A\_1^{(j)}} P\_2(X) \mathrm{d}X \\ &= \int\_{A\_2^{(j)}} [q\_1 P\_1(X) - q\_2 P\_2(X)] \mathrm{d}X + q\_2 \int\_{A\_1^{(j)} \cup A\_2^{(j)}} P\_2(X) \mathrm{d}X. \end{split} \tag{12.3.5}$$

If *A(j )* <sup>1</sup> <sup>∪</sup>*A(j )* <sup>2</sup> = *Rp*, then # *A(j )* <sup>1</sup> <sup>∪</sup>*A(j )* 2 *P*2*(X)*d*X* = 1; it is otherwise a given positive constant. However, *q*1*P*1*(X)*−*q*2*P*2*(X)* can be negative, zero or positive, whereas the left-hand side of (12.3.5) is a positive probability. Accordingly, the left-hand side is minimum if

$$q\_1 P\_1(X) - q\_2 P\_2(X) < 0 \Rightarrow \frac{P\_1(X)}{P\_2(X)} < \frac{q\_2}{q\_1},\tag{i}$$

which actually is the rejection region *A*<sup>2</sup> of the procedure *A* = *(A*1*, A*2*)*. Hence, the procedure *A* = *(A*1*, A*2*)* minimizes the probabilities of misclassification; in other words, it is the best procedure. If cost functions are also involved, then *(i)* becomes the following:

$$\frac{P\_1(X)}{P\_2(X)} < \frac{C(1|2)\,q\_2}{C(2|1)\,q\_1}.\tag{ii}$$

The region where *q*1*P*1*(X)*−*q*2*P*2*(X)* = 0 or *q*1*C(*2|1*)P*1*(X)*−*q*2*C(*1|2*)P*2*(X)* = 0 need not be empty and the probability over this set need not be zero. If

$$\Pr\left\{\frac{P\_1(X)}{P\_2(X)} = \frac{q\_2}{q\_1} \frac{C(1|2)}{C(2|1)} \Big| \pi\_i \right\} = 0, \ i = 1, 2,\tag{12.3.6}$$

it can also be shown that the above Bayes procedure *A* = *(A*1*, A*2*)* is unique. This is stated as a theorem:

**Theorem 12.3.1.** *Let q*<sup>1</sup> *be the prior probability of drawing an observation X from the population π*<sup>1</sup> *with probability/density function P*1*(X) and let q*<sup>2</sup> *be the prior probability of selecting an observation X from the population π*<sup>2</sup> *with probability/density function P*2*(X). Let the cost or loss associated with misclassifying an observation from π*<sup>1</sup> *as coming from π*<sup>2</sup> *be C(*2|1*) and the cost of misclassifying an observation from π*<sup>2</sup> *as originating from π*<sup>1</sup> *be C(*1|2*). Letting*

$$\Pr\left\{\frac{P\_1(X)}{P\_2(X)} = \frac{C(1|2)}{C(2|1)}\frac{q\_2}{q\_1}\Big|\pi\_i\right\} = 0, \ i = 1, 2, \dots$$

*the classification rule given by A* = *(A*1*, A*2*) of (12.3.4) is unique and best in the sense that it minimizes the probabilities of misclassification.*

**Example 12.3.1.** Let *π*<sup>1</sup> and *π*<sup>2</sup> be two univariate exponential populations whose parameters are *θ*<sup>1</sup> and *θ*<sup>2</sup> with *θ*<sup>1</sup> = *θ*2. Let the prior probability of drawing an observation from *<sup>π</sup>*<sup>1</sup> be *<sup>q</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> and that of selecting an observation from *<sup>π</sup>*<sup>2</sup> be *<sup>q</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> . Let the costs or loss associated with misclassifications be *C(*2|1*)* = *C(*1|2*)*. Compute the regions and probabilities of misclassification if (1): a single observation *x* is drawn; (2): iid observations *x*1*,...,xn* are drawn.

**Solution 12.3.1.**(1). In this case, one observation is drawn and the populations are

$$P\_i(\mathbf{x}) = \frac{1}{\theta\_i} \mathbf{e}^{-\frac{\mathbf{x}}{\theta\_i}}, \ \mathbf{x} \ge 0, \ \theta\_i > 0, \ i = 1, 2.$$

Consider the following inequality on the support of the density:

$$\frac{P\_1(\boldsymbol{x})}{P\_2(\boldsymbol{x})} \ge \frac{C(1|2)}{C(2|1)} \frac{q\_2}{q\_1} = 1,$$

or equivalently,

$$\frac{\theta\_2}{\theta\_1} \mathbf{e}^{-\mathbf{x}(\frac{1}{\theta\_1} - \frac{1}{\theta\_2})} \ge 1 \Rightarrow \mathbf{e}^{-\mathbf{x}(\frac{1}{\theta\_1} - \frac{1}{\theta\_2})} \ge \frac{\theta\_1}{\theta\_2}.$$

On taking logarithms, we have

$$\begin{aligned} -x\left(\frac{1}{\theta\_1} - \frac{1}{\theta\_2}\right) \ge \ln \frac{\theta\_1}{\theta\_2} \Rightarrow x\left(\frac{1}{\theta\_2} - \frac{1}{\theta\_1}\right) \ge \ln \frac{\theta\_1}{\theta\_2} \\ \Rightarrow x \ge \frac{\theta\_1 \theta\_2}{\theta\_1 - \theta\_2} \ln \frac{\theta\_1}{\theta\_2} \quad \text{for } \theta\_1 > \theta\_2. \end{aligned}$$

Letting *θ*<sup>1</sup> *> θ*2, the steps in the case *θ*<sup>1</sup> *< θ*<sup>2</sup> being parallel, we have

$$x \ge k, \ k = \frac{\theta\_1 \theta\_2}{\theta\_1 - \theta\_2} \ln \frac{\theta\_1}{\theta\_2}.$$

Accordingly,

$$A\_1: x \ge k \text{ and } A\_2: x < k.$$

The probabilities of misclassification are:

$$\begin{split} P(2|1) &= \int\_{A\_2} P\_1(\mathbf{x}) \mathbf{dx} = \int\_{x=0}^{k} \frac{1}{\theta\_1} \mathbf{e}^{-\frac{x}{\theta\_1}} \mathbf{dx} = 1 - \mathbf{e}^{-\frac{k}{\theta\_1}}, \\ P(1|2) &= \int\_{x=k}^{\infty} \frac{1}{\theta\_2} \mathbf{e}^{-\frac{x}{\theta\_2}} \mathbf{dx} = \mathbf{e}^{-\frac{k}{\theta\_2}} \,. \end{split}$$

**Solution 12.3.1.**(2). In this case, *X*-= *(x*1*,...,xn)* and

$$P\_i(X) = \prod\_{j=1}^n \frac{1}{\theta\_i} \mathbf{e}^{-\frac{x\_j}{\theta\_i}} = \frac{1}{\theta\_i^n} \mathbf{e}^{-\frac{u}{\theta\_i}}, \ i = 1, 2, 3$$

where *<sup>u</sup>* <sup>=</sup> *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *xj* is gamma distributed with the parameters *(n, θi), i* <sup>=</sup> <sup>1</sup>*,* 2. The density of *u* is then given by

$$g\_i(\mu) = \frac{1}{\theta\_i^n \Gamma(n)} \mu^{n-1} \mathbf{e}^{-\frac{\mu}{\theta\_i}}, \ i = 1, 2.$$

Proceeding as above, for *<sup>θ</sup>*<sup>1</sup> *> θ*2, *<sup>A</sup>*<sup>1</sup> : *<sup>u</sup>* <sup>≥</sup> *<sup>k</sup>*<sup>1</sup> and *<sup>A</sup>*<sup>2</sup> : *u<k*1*, k*<sup>1</sup> <sup>=</sup> *<sup>θ</sup>*1*θ*<sup>2</sup> *θ*1−*θ*<sup>2</sup> ln[ *θ*1 *θ*2 ] *<sup>n</sup>* <sup>=</sup> *nk* where *k* is as given in Solution 12.3.1(1). Consequently, the probabilities of misclassification are as follows:

$$\begin{split} P(2|1) &= \int\_{\mu=0}^{k\_1} \frac{\mu^{n-1}}{\theta\_1^n \Gamma(n)} \mathbf{e}^{-\frac{\mu}{\theta\_1}} \mathbf{d} \mu = \int\_0^{\frac{k\_1}{\theta\_1}} \frac{\mu^{n-1}}{\Gamma(n)} \mathbf{e}^{-\mu} \mathbf{d} \mu \\ P(1|2) &= \int\_{k\_1}^{\infty} \frac{\mu^{n-1}}{\theta\_2^n \Gamma(n)} \mathbf{e}^{-\frac{\mu}{\theta\_2}} \mathbf{d} \mu = \int\_{\frac{k\_1}{\theta\_2}}^{\infty} \frac{\mu^{n-1}}{\Gamma(n)} \mathbf{e}^{-\mu} \mathbf{d} \mu \end{split}$$

where the integrals can be expressed in terms of incomplete gamma functions or determined by using integration by parts.

**Example 12.3.2.** Assume that no prior probabilities or costs are involved. Suppose that in a certain clinic, the waiting time before a customer is attended to, depends upon the manager on duty. If manager *M*<sup>1</sup> is on duty, the expected waiting time is 10 minutes, and if manager *M*<sup>2</sup> is on duty, the expected waiting time is 5 minutes. Assume that the waiting times are exponentially distributed with expected waiting time equal to *θi, i* = 1*,* 2. On a particular day (1): a customer had to wait 6 minutes before she was attended to, (2): three customers had to wait 6*,* 6 and 8 minutes, respectively. Who between *M*<sup>1</sup> and *M*<sup>2</sup> was likely to be on duty on that day?

**Solution 12.3.2.**(1). In this case, *θ*<sup>1</sup> = 10*, θ*<sup>2</sup> = 5 and the populations are exponential with parameters *<sup>θ</sup>*<sup>1</sup> and *<sup>θ</sup>*2, respectively. Thus, *<sup>k</sup>* <sup>=</sup> *<sup>θ</sup>*1*θ*<sup>2</sup> *θ*1−*θ*<sup>2</sup> ln *<sup>θ</sup>*<sup>1</sup> *<sup>θ</sup>*<sup>2</sup> <sup>=</sup> *(*10*)(*5*)* <sup>10</sup>−<sup>5</sup> ln <sup>10</sup> <sup>5</sup> = 10 ln 2, *k <sup>θ</sup>*<sup>1</sup> <sup>=</sup> 10 ln 2 <sup>10</sup> <sup>=</sup> ln 2*, <sup>k</sup> <sup>θ</sup>*<sup>2</sup> <sup>=</sup> 2 ln 2 <sup>=</sup> ln 4, e<sup>−</sup> *<sup>k</sup> <sup>θ</sup>*<sup>1</sup> <sup>=</sup> <sup>e</sup><sup>−</sup> ln 2 <sup>=</sup> <sup>1</sup> <sup>2</sup> <sup>=</sup> <sup>0</sup>*.*5, and e<sup>−</sup> *<sup>k</sup> <sup>θ</sup>*<sup>2</sup> <sup>=</sup> <sup>e</sup><sup>−</sup> ln 4 <sup>=</sup> <sup>1</sup> 4 = 0*.*25. In (1): the observed value of *x* = 6 *<* 10*(*ln 2*)* = 10*(*0*.*69314718056*)* ≈ 6*.*9315. Accordingly, we classify *x* to *M*2, that is, the manager *M*<sup>2</sup> was likely to be on duty. Thus,

*P (*2|2*, A)* = The probability of making a correct decision

$$\begin{aligned} &= \int\_{\mathbf{x}$$

*P (*2|1*, A)* = Probability of misclassification or making an incorrect decision

$$=\int\_{0}^{k} P\_{\mathrm{I}}(\mathbf{x})d\mathbf{x} = \int\_{0}^{k} \frac{1}{10} \mathbf{e}^{-\frac{\mathbf{x}}{10}} d\mathbf{x} = 1 - \mathbf{e}^{-\frac{k}{10}} = 1 - \mathbf{e}^{-\ln 2} = \frac{1}{2} = 0.5.$$

**Solution 12.3.2.**(2). Here, *<sup>u</sup>* <sup>=</sup> <sup>6</sup> <sup>+</sup> <sup>6</sup> <sup>+</sup> <sup>8</sup> <sup>=</sup> <sup>20</sup>*, n* <sup>=</sup> 3 and *<sup>k</sup>*<sup>1</sup> <sup>=</sup> *<sup>θ</sup>*1*θ*<sup>2</sup> *θ*1−*θ*<sup>2</sup> *n* ln *<sup>θ</sup>*<sup>1</sup> *<sup>θ</sup>*<sup>2</sup> = *(*10*)(*5*)* <sup>10</sup>−<sup>5</sup> 3 ln <sup>10</sup> <sup>5</sup> = 30 ln 2. Since 30 ln 2 ≈ 20*.*795 and the observed value of *u* is 20, *u<k*1, and we assign the sample to *π*<sup>2</sup> or to *P*2*(X)* or *M*2, with *<sup>k</sup>*<sup>1</sup> *<sup>θ</sup>*<sup>2</sup> <sup>=</sup> 30 ln 2 <sup>5</sup> <sup>=</sup> 6 ln 2 and *<sup>k</sup>*<sup>1</sup> *<sup>θ</sup>*<sup>1</sup> = 30 ln 2 <sup>10</sup> = 3 ln 2. Thus,

*P (*2|2*, A)* = Probability of making a correct classification decision

$$\begin{aligned} &= \Pr\{\boldsymbol{\mu} < k\_1 | P\_2(X)\} = \int\_0^{k\_1} \frac{\boldsymbol{\mu}^{n-1}}{\theta\_2^n \Gamma(n)} \mathbf{e}^{-\frac{\boldsymbol{\mu}}{\theta\_2}} \mathbf{d} \boldsymbol{\mu} \\ &= \int\_0^{6\ln 2} \frac{\boldsymbol{v}^2 \mathbf{e}^{-\boldsymbol{v}}}{\Gamma(3)} \mathbf{d} \boldsymbol{v}, \quad \text{with } \Gamma(3) = 2! = 2. \end{aligned}$$

Integrating by parts,

$$\int v^2 \mathbf{e}^{-v} \mathrm{d}v = -[v^2 + 2v + 2] \mathrm{e}^{-v}.$$

Then,

$$\frac{1}{2} \int\_0^{6\ln 2} v^2 \mathbf{e}^{-v} \mathbf{d}v = -\left[ \left[ 2v^2/2 + v + 1 \right] \mathbf{e}^{-v} \right]\_0^{6\ln 2} = 1 - \frac{1}{64} [(6\ln 2)^2/2 + (6\ln 2) + 1]$$

$$\approx 1 - \frac{1}{64} [13.797] \approx 0.785, \text{ and }$$

*P (*2|1*, A)* = Probability of misclassification

$$\begin{aligned} &= \int\_0^{k\_1} P\_1(X) \mathrm{d}X = \frac{1}{2} \int\_0^{3\ln 2} v^2 \mathrm{e}^{-v} \mathrm{d}v = -[\frac{1}{2}v^2 + v + 1] \mathrm{e}^{-v}|\_0^{3\ln 2} \\ &= 1 - \frac{1}{2^3} [(3\ln 2)^2/2 + (3\ln 2) + 1] \approx 0.485. \end{aligned}$$

**Example 12.3.3.** Let the two populations *π*<sup>1</sup> and *π*<sup>2</sup> be univariate normal with mean values *<sup>μ</sup>*<sup>1</sup> and *<sup>μ</sup>*2*,* respectively, and the same variance *<sup>σ</sup>*2, that is, *<sup>P</sup>*1*(x)* : *<sup>N</sup>*1*(μ*1*, σ*2*)* and *<sup>P</sup>*2*(x)* : *<sup>N</sup>*1*(μ*2*, σ*2*)*. Let the prior probabilities of drawing an observation from these populations be *<sup>q</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> and *<sup>q</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> , respectively, and the costs or loss involved with misclassification be *C(*1|2*)* = *C(*2|1*)*. Determine the regions of misclassification and the corresponding probabilities of misclassification if (1): a single observation *x* is available; (2): iid observations *x*1*,...,xn* are available, from *π*<sup>1</sup> or *π*2.

**Solution 12.3.3.**(1). If one observation is available,

$$P\_l(\mathbf{x}) = \frac{1}{\sigma\sqrt{2\pi}}\mathbf{e}^{-\frac{(\mathbf{x}-\mu\_l)^2}{2\sigma^2}}, \ -\infty < \mathbf{x} < \infty, \ -\infty < \mu\_l < \infty, \ \sigma > 0.$$

Consider regions

$$A\_1: \frac{P\_1(\mathbf{x})}{P\_2(\mathbf{x})} \ge \frac{C(1|2)}{C(2|1)} \frac{q\_2}{q\_1} = 1 \Rightarrow \mathbf{e}^{-\frac{1}{2\sigma^2} \left[ (\mathbf{x} - \mu\_1)^2 - (\mathbf{x} - \mu\_2)^2 \right]} \ge 1$$

$$\Rightarrow -\left[ \frac{1}{2\sigma^2} \mathbf{I} (\mathbf{x} - \mu\_1)^2 - (\mathbf{x} - \mu\_2)^2 \right] \ge 0.$$

Now, note that

$$\begin{aligned} -[(\mathbf{x} - \boldsymbol{\mu\_1})^2 - (\mathbf{x} - \boldsymbol{\mu\_2})^2] &= 2\mathbf{x}(\mu\_1 - \mu\_2) - (\mu\_1^2 - \mu\_2^2) \ge 0 \Rightarrow \\ \mathbf{x} &\ge \frac{1}{2} \frac{(\mu\_1^2 - \mu\_2^2)}{(\mu\_1 - \mu\_2)} = \frac{1}{2}(\mu\_1 + \mu\_2) \text{ for } \mu\_1 > \mu\_2 \Rightarrow \\ A\_1: \mathbf{x} &\ge \frac{1}{2}(\mu\_1 + \mu\_2) \text{ and } A\_2: \mathbf{x} < \frac{1}{2}(\mu\_1 + \mu\_2). \end{aligned}$$

The probabilities of misclassification are the following for *<sup>k</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *(μ*<sup>1</sup> + *μ*2*)*:

$$\begin{split}P(2|1) &= \int\_{-\infty}^{k} \frac{1}{\sigma\sqrt{2\pi}} \mathbf{e}^{-\frac{(\mathbf{x}-\mu\_{1})^{2}}{2\sigma^{2}}} \mathbf{dx} = \Phi\left(\frac{k-\mu\_{1}}{\sigma}\right) \\ P(1|2) &= \int\_{k}^{\infty} \frac{1}{\sigma\sqrt{2\pi}} \mathbf{e}^{-\frac{(\mathbf{x}-\mu\_{2})^{2}}{2\sigma^{2}}} \mathbf{dx} = 1 - \Phi\left(\frac{k-\mu\_{2}}{\sigma}\right) \end{split}$$

where *Φ(*·*)* is the distribution function of a univariate standard normal density and *<sup>k</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *(μ*<sup>1</sup> + *μ*2*)*.

**Solution 12.3.3.**(2). In this case, *x*1*,...,xn* are iid and *X*- = *(x*1*,...,xn)*. The multivariate densities are

$$P\_l(X) = \frac{1}{\sigma^n(\sqrt{2\pi})^n} \mathbf{e}^{-\frac{1}{2\sigma^2} \sum\_{j=1}^n (\mathbf{x}\_j - \mu\_l)^2} = \frac{\mathbf{e}^{-\frac{1}{2\sigma^2} (\sum\_{j=1}^n (\mathbf{x}\_j - \bar{\mathbf{x}})^2 + n(\bar{\mathbf{x}} - \mu\_l)^2)}}{\sigma^n(\sqrt{2\pi})^n}, \; i = 1, 2, \dots$$

where *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> *n <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *xj* . Hence for *<sup>μ</sup>*<sup>1</sup> *> μ*2*,*

$$A\_1 : \frac{P\_1(X)}{P\_2(X)} \ge 1 \Rightarrow \mathbf{e}^{-\frac{\pi}{2\sigma^2} \left[ (\bar{x} - \mu\_1)^2 - (\bar{x} - \mu\_2)^2 \right]} \ge 1.$$

Taking logarithms and simplifying, we have

$$\begin{aligned} -\frac{n}{2\sigma^2} [\left(\bar{\boldsymbol{x}} - \boldsymbol{\mu\_1}\right)^2 - \left(\bar{\boldsymbol{x}} - \boldsymbol{\mu\_2}\right)^2] &\geq 0 \Rightarrow \\ \bar{\boldsymbol{x}} &\geq \frac{\mu\_1^2 - \mu\_2^2}{2(\mu\_1 - \mu\_2)} = \frac{1}{2}(\mu\_1 + \mu\_2) \text{ for } \mu\_1 > \mu\_2 \end{aligned}$$

where

$$\bar{\boldsymbol{x}} \sim N\_1(\mu\_i, \frac{\sigma^2}{n}), \ i = 1, 2.$$

Therefore the probabilities of misclassification are the following:

$$\begin{split} P(2|1) &= \int\_{-\infty}^{k} \frac{\sqrt{n}}{\sigma\sqrt{2\pi}} \mathbf{e}^{-\frac{n}{2\sigma^{2}}(\bar{\mathbf{x}}-\mu\_{1})^{2}} \mathbf{d}\bar{\mathbf{x}} = \Phi\left(\frac{\sqrt{n}(\mu\_{2}-\mu\_{1})}{2\sigma}\right) \\ P(1|2) &= \int\_{k}^{\infty} \frac{\sqrt{n}}{\sigma\sqrt{2\pi}} \mathbf{e}^{-\frac{n}{2\sigma^{2}}(\bar{\mathbf{x}}-\mu\_{2})^{2}} \mathbf{d}\bar{\mathbf{x}} = 1 - \Phi\left(\frac{\sqrt{n}(\mu\_{1}-\mu\_{2})}{2\sigma}\right) \end{split}$$

where *<sup>k</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *(μ*1+*μ*2*)* and *Φ(*·*)* is the distribution function of a univariate standard normal random variable.

**Example 12.3.4.** Assume that no prior probabilities or costs are involved. A tuber crop called tapioca is planted by farmers. While farmer *F*<sup>1</sup> applies a standard fertilizer to the soil to enhance the growth of the tapioca plants, farmer *F*<sup>2</sup> does not apply any fertilizer and let the plants grow naturally. At harvest time, a tapioca plant is pulled up with all its tubers attached to the bottom of the stem. The upper part of the stem is cut off and the lower part with its tubers is put out for sale. Tuber yield per plant, *x*, is measured by weighing the lower part of the stem with the tubers attached. It is known from past experience that *<sup>x</sup>* is normally distributed with mean value *<sup>μ</sup>*<sup>1</sup> <sup>=</sup> 5 and variance *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> for *<sup>F</sup>*<sup>1</sup> type farms, that is, *<sup>x</sup>* <sup>∼</sup> *<sup>N</sup>*1*(μ*<sup>1</sup> <sup>=</sup> <sup>5</sup>*, σ*<sup>2</sup> <sup>=</sup> <sup>1</sup>*)*|*F*<sup>1</sup> and that for *<sup>F</sup>*<sup>2</sup> type farms, *<sup>x</sup>* <sup>∼</sup> *<sup>N</sup>*1*(μ*<sup>2</sup> <sup>=</sup> <sup>3</sup>*, σ*<sup>2</sup> <sup>=</sup> <sup>1</sup>*)*|*F*2, the weights being measured in kilograms. A road-side vendor is selling tapioca and his collection is either from *F*<sup>1</sup> type farms or *F*<sup>2</sup> type farms, but not both. A customer picked (1): one stem with its tubers attached weighing 4.2 kg (2) a random sample of four stems respectively weighing 6, 4, 3 and 5 kg. To which type of farms will you classify the observations in (1) and (2)?

**Solution 12.3.4.** (1). The decision is based on *<sup>k</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> *(μ*<sup>1</sup> <sup>+</sup> *<sup>μ</sup>*2*)* <sup>=</sup> <sup>1</sup> <sup>2</sup> *(*5 + 3*)* = 4. In this case, the decision rule *A* = *(A*1*, A*2*)* is such that *A*<sup>1</sup> : *x* ≥ *k* and *A*<sup>2</sup> : *x<k* for *μ*<sup>1</sup> *> μ*2. Note that *<sup>k</sup>*−*μ*<sup>1</sup> *<sup>σ</sup>* <sup>=</sup> *<sup>k</sup>* <sup>−</sup> *<sup>μ</sup>*<sup>1</sup> <sup>=</sup> <sup>4</sup> <sup>−</sup> <sup>5</sup> = −1 and *<sup>k</sup>*−*μ*<sup>2</sup> *<sup>σ</sup>* = *(*4 − 3*)* = 1. As the observed *x* is 4*.*2 *>* 4 = *k*, we classify *x* into *P*1*(X)* : *N*1*(μ*1*,* 1*)*. Moreover,

*P (*1|1*, A)* = Probability of making a correct classification decision

$$\begin{aligned} &= \Pr\{\mathbf{x} \ge k | P\_1(\mathbf{x})\} = \int\_k^\infty \frac{\mathbf{e}^{-\frac{1}{2}(\mathbf{x}-\mu\_1)^2}}{\sqrt{(2\pi)}} d\mathbf{x} \\ &= \int\_{-1}^\infty \frac{\mathbf{e}^{-\frac{1}{2}\mu^2}}{\sqrt{(2\pi)}} = 0.5 + \int\_0^1 \frac{\mathbf{e}^{-\frac{1}{2}\mu^2}}{\sqrt{(2\pi)}} d\mathbf{x} \approx 0.84,\end{aligned}$$

and

*P (*1|2*, A)* = Probability of misclassification

$$\mathbf{x} = \Pr\{\mathbf{x} \ge k | P\_2(\mathbf{x})\} = \int\_k^\infty \frac{\mathbf{e}^{-\frac{1}{2}(\mathbf{x}-\mu\_2)^2}}{\sqrt{(2\pi)}} \mathbf{dx} = \int\_1^\infty \frac{\mathbf{e}^{-\frac{1}{2}\mu^2}}{\sqrt{(2\pi)}} \mathbf{dx} \approx 0.16.$$

**Solution 12.3.4.** (2). In this case, *<sup>x</sup>*¯ <sup>=</sup> <sup>1</sup> <sup>4</sup> *(*6+4+3+5*)* <sup>=</sup> <sup>4</sup>*.*5*, n* <sup>=</sup> 4, *<sup>x</sup>*¯ <sup>∼</sup> *N (μi,* <sup>1</sup> *<sup>n</sup>), i* = 1*,* 2, *(k*−*μ*1*) σ/*√*<sup>n</sup>* <sup>=</sup> <sup>2</sup>*(*<sup>4</sup> <sup>−</sup> <sup>5</sup>*)* = −2 and *(k*−*μ*2*) σ/*√*<sup>n</sup>* = 2*(*4 − 3*)* = 2. Since the observed *x*¯ is 4*.*5 *>* 4 = *k*, we assign the sample to *P*1*(X)* : *N (μ*1*,* 1*)*, the criterion being *A*<sup>1</sup> : ¯*x* ≥ *k* and *A*<sup>2</sup> : ¯*x<k*. Additionally,

*P (*1|1*, A)* = Probability of a correct classification

$$\begin{split} \mathcal{I} &= \Pr\{\ddot{\boldsymbol{x}} \ge k | P\_1(\boldsymbol{X})\} = \int\_k^\infty \frac{\mathbf{e}^{-\frac{\boldsymbol{\mu}}{2}(\ddot{\boldsymbol{x}} - \boldsymbol{\mu}\_1)^2}}{\sqrt{(2\pi)}} \mathbf{d}\ddot{\boldsymbol{x}} = \int\_{-2}^\infty \frac{\mathbf{e}^{-\frac{1}{2}\boldsymbol{u}^2}}{\sqrt{(2\pi)}} \mathbf{d}\boldsymbol{u} \\ &= 0.5 + \int\_0^2 \frac{\mathbf{e}^{-\frac{1}{2}\boldsymbol{u}^2}}{\sqrt{(2\pi)}} \mathbf{d}\boldsymbol{u} \approx 0.98, \end{split}$$

and

*P (*1|2*, A)* = Probability of misclassification

$$\mathbf{P} = \Pr\{\bar{\mathbf{x}} \ge k | P\_2(\mathbf{X})\} = \int\_k^\infty \frac{\mathbf{e}^{-\frac{\mu}{2}(\bar{\mathbf{x}} - \mu\_2)^2}}{\sqrt{(2\pi)}} \mathbf{d}\bar{\mathbf{x}} = \int\_2^\infty \frac{\mathbf{e}^{-\frac{1}{2}\mu^2}}{\sqrt{(2\pi)}} \mathbf{d}\mu \approx 0.023.$$

**Example 12.3.5.** Let *π*<sup>1</sup> and *π*<sup>2</sup> be two *p*-variate real nonsingular normal populations sharing the same covariance matrix, *<sup>π</sup>*<sup>1</sup> : *Np(μ(*1*) , Σ), Σ > O,* and *π*<sup>2</sup> : *Np(μ(*2*) , Σ), Σ > O,* whose mean values are such that *<sup>μ</sup>(*1*)* <sup>=</sup> *μ(*2*)* . Let the prior probabilities be *q*<sup>1</sup> = *q*<sup>2</sup> and the cost functions be *C(*1|2*)* = *C(*2|1*)*. Consider a single *p*-vector *X* to be classified into *π*<sup>1</sup> or *π*2. Determine the regions of misclassification and the corresponding probabilities.

**Solution 12.3.5.** The *p*-variate real normal densities are the following:

$$P\_l(X) = \frac{1}{(2\pi)^{\frac{p}{2}} |\Sigma|^{\frac{1}{2}}} \mathbf{e}^{-\frac{1}{2}(X - \mu^{(i)})' \Sigma^{-1} (X - \mu^{(i)})} \tag{i}$$

for *<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*, Σ > O, μ(*1*)* <sup>=</sup> *μ(*2*)* . Consider the inequality

$$\begin{aligned} \frac{P\_1(X)}{P\_2(X)} &\geq \frac{C(1|2)}{C(2|1)} \frac{q\_2}{q\_1} = 1 \Rightarrow\\ &\mathbf{e}^{-\frac{1}{2}[(X-\mu^{(1)})'\Sigma^{-1}(X-\mu^{(1)}) - (X-\mu^{(2)})'\Sigma^{-1}(X-\mu^{(2)})]} \geq 1. \end{aligned}$$

Taking logarithms, we have

$$\begin{aligned} & -\frac{1}{2} [(X - \mu^{(1)})' \Sigma^{-1} (X - \mu^{(1)}) - (X - \mu^{(2)})' \Sigma^{-1} (X - \mu^{(2)})] \geq 0 \Rightarrow \\ & (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X - \frac{1}{2} (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} + \mu^{(2)}) \geq 0. \end{aligned}$$

Let

$$u = (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X - \frac{1}{2} (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} + \mu^{(2)}).\tag{12.3.7}$$

Then, *u* has a univariate normal distribution since it is a linear function of the components of *X*, which is a *p*-variate normal. Thus,

$$\begin{split} \text{Var}(\mu) &= \text{Var}[(\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X] \\ &= (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} \text{Cov}(X) \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) \\ &= (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) = \Delta^2 \end{split} \tag{12.3.8}$$

where *Δ*<sup>2</sup> is Mahalanobis' distance. The mean values of *u* under *π*<sup>1</sup> and *π*<sup>2</sup> are respectively,

$$\begin{split} E(u)|\pi\_{1} &= (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} E(X) |\pi\_{1} - \frac{1}{2} (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} + \mu^{(2)}) \\ &= \frac{1}{2} (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) = \frac{1}{2} \Delta^{2}, \\ E(u)|\pi\_{2} &= (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} \mu^{(2)} - \frac{1}{2} (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} + \mu^{(2)}) \\ &= \frac{1}{2} (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(2)} - \mu^{(1)}) = -\frac{1}{2} \Delta^{2}, \end{split} \tag{12.3.10}$$

so that

$$
\mu \sim N\_1(\frac{1}{2}\Delta^2, \Delta^2) \text{ under } \pi\_1,
$$

$$
\mu \sim N\_1(-\frac{1}{2}\Delta^2, \Delta^2) \text{ under } \pi\_2.\tag{12.3.11}
$$

Accordingly, the regions of misclassification are

$$A\_2: u < 0 | \pi\_1: u \sim N\_1(\frac{1}{2}\Delta^2, \Delta^2) \text{ and } A\_1: u \ge 0 | \pi\_2: u \sim N\_1(-\frac{1}{2}\Delta^2, \Delta^2), \text{ (12.3.12)}$$

and the probabilities of misclassification are as follows:

$$\begin{split} P(2|1) &= \int\_{-\infty}^{0} \frac{1}{\Delta \sqrt{2\pi}} \mathbf{e}^{-\frac{1}{2\Delta^2} (\mu - \frac{1}{2}\Delta^2)^2} \mathbf{d}\mu \\ &= \int\_{-\infty}^{\frac{0 - \frac{1}{2}\Delta^2}{\Delta}} \frac{1}{\sqrt{2\pi}} \mathbf{e}^{-\frac{l^2}{2}} \mathbf{d}t = \Phi(-\frac{1}{2}\Delta) \\ &\int\_{-\infty}^{\infty} 1 \qquad \frac{1}{1 - \frac{1}{2}(\mu + \frac{1}{2}\Delta^2)} \end{split} \tag{ii}$$

$$\begin{split} P(1|2) &= \int\_0^\infty \frac{1}{\Delta \sqrt{2\pi}} \mathbf{e}^{-\frac{1}{2\Delta^2} (\mu + \frac{1}{2}\Delta^2)} \mathbf{d}u \\ &= \int\_{\frac{0+\frac{1}{2}\Delta^2}{\Delta}}^\infty \frac{1}{\sqrt{2\pi}} \mathbf{e}^{-\frac{\imath^2}{2}} \mathbf{d}t = 1 - \Phi(\frac{1}{2}\Delta) \end{split} \tag{iii}$$

where *Φ(*·*)* denotes the distribution function of a univariate standard normal variable.

**Note 12.3.1.** If no conditions are imposed on the prior probabilities, *q*<sup>1</sup> and *q*2*,* or on the costs of misclassification, *C(*2|1*)* and *C(*1|2*),* then the regions are determined as *A*<sup>1</sup> : *u* ≥ *k, k* <sup>=</sup> ln *C(*1|2*) q*<sup>2</sup> *C(*2|1*) q*<sup>1</sup> *,* and *A*<sup>2</sup> : *u<k*. In this case, the probabilities of misclassification will be *<sup>Φ</sup> <sup>k</sup>*−<sup>1</sup> 2*Δ*<sup>2</sup> *Δ* and 1 <sup>−</sup> *<sup>Φ</sup> <sup>k</sup>*+<sup>1</sup> 2*Δ*<sup>2</sup> *Δ ,* respectively.

**Note 12.3.2.** If the prior probabilities *q*<sup>1</sup> and *q*<sup>2</sup> are not known, we may assume that the two populations *<sup>π</sup>*<sup>1</sup> and *<sup>π</sup>*<sup>2</sup> are equally likely to be chosen or equivalently that *<sup>q</sup>*<sup>1</sup> <sup>=</sup> *<sup>q</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> , in which instance *<sup>k</sup>* <sup>=</sup> ln *C(*1|2*) C(*2|1*)*. Then, the correct decisions are to assign the vector *X* at hand to *π*<sup>1</sup> in the region *A*<sup>1</sup> and to *π*<sup>2</sup> in the region *A*2, where *A*<sup>1</sup> : *u* ≥ *k* and *<sup>A</sup>*<sup>2</sup> : *u < k, k* <sup>=</sup> ln *<sup>q</sup>*<sup>2</sup> *C(*1|2*) <sup>q</sup>*<sup>1</sup> *C(*2|1*)* with *<sup>q</sup>*1*, q*2*,C(*2|1*)* and *C(*1|2*)* assumed to be known and

$$\mu = (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X - \frac{1}{2} (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} + \mu^{(2)})$$

whose first term, namely *(μ(*1*)*−*μ(*2*) )*- *Σ*−1*X*, is known as the linear discriminant function, which is utilized to discriminate or to separate two *p*-variate populations, not necessarily normally distributed, having mean value vectors *μ(*1*)* and *μ(*2*)* and sharing the same covariance matrix *Σ>O*.

**Example 12.3.6.** Assume that no prior probabilities or costs are involved. Applicants to a certain training program are given tests to evaluate their aptitude for languages and aptitude for science. Let the test scores be denoted by *x*<sup>1</sup> and *x*2*,* respectively. Let *X* be the bivariate vector *X* = *x*1 *x*2 . After completing the training program, their aptitudes are tested again. Let *X(*1*)*- = [*x(*1*)* <sup>1</sup> *, x(*1*)* <sup>2</sup> ] be the score vector in the group of successful trainees and let *X(*2*)*- = [*x(*2*)* <sup>1</sup> *, x(*2*)* <sup>2</sup> ] be the score vector in the group of unsuccessful trainees. From previous experience of conducting such tests over the years, it is known that *<sup>X</sup>(*1*)* <sup>∼</sup> *<sup>N</sup>*2*(μ(*1*) , Σ), Σ > O*, and *<sup>X</sup>(*2*)* <sup>∼</sup> *<sup>N</sup>*2*(μ(*2*) , Σ), Σ > O,* where

$$
\mu^{(1)} = \begin{bmatrix} 4 \\ 1 \end{bmatrix}, \ \mu^{(2)} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}, \ \Sigma = \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix} \Rightarrow \Sigma^{-1} = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix}.
$$

Then (1): one applicant taken at random before the training program started obtained the test scores *X*<sup>0</sup> = 4 1 ; (2): three applicants chosen at random before the training program started had the following scores:

$$
\begin{bmatrix} 4\\2 \end{bmatrix}, \begin{bmatrix} 3\\1 \end{bmatrix}, \begin{bmatrix} 5\\1 \end{bmatrix}.
$$

In (1), classify *X*<sup>0</sup> to *π*<sup>1</sup> or *π*<sup>2</sup> and in (2), classify the entire sample of three vectors into *π*<sup>1</sup> or *π*2.

**Solution 12.3.6.** Let us compute certain quantities which are needed to answer the questions:

$$
\begin{split}
\frac{1}{2}(\mu^{(1)} + \mu^{(2)}) &= \frac{1}{2} \left( \begin{bmatrix} 4 \\ 1 \end{bmatrix} + \begin{bmatrix} 2 \\ 1 \end{bmatrix} \right) = \frac{1}{2} \begin{bmatrix} 6 \\ 2 \end{bmatrix} = \begin{bmatrix} 3 \\ 1 \end{bmatrix}; \\\ \mu^{(1)} - \mu^{(2)} &= \begin{bmatrix} 2 \\ 0 \end{bmatrix}, \ (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X = [2, -2]X = 2\mathbf{x}\_1 - 2\mathbf{x}\_2; \\\ \Delta^2 &= (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) = [2, 0] \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 2 \\ 0 \end{bmatrix} = 4; \\\ (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} + \mu^{(2)}) &= [2, 0] \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 6 \\ 2 \end{bmatrix} = 8.
\end{split}
$$

Hence,

$$\begin{aligned} u &= (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X - \frac{1}{2} (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} + \mu^{(2)}) \\ &= 2x\_1 - 2x\_2 - 4; \\\ u|\pi\_1 &\sim N\_1(\frac{1}{2}\Delta^2, \Delta^2), \ u|\pi\_2 \sim N\_1(-\frac{1}{2}\Delta^2, \Delta^2); \\\ A\_1 &\coloneqq 0, A\_2 : u < 0. \end{aligned}$$
 
$$\mathbb{T}\_{\mathsf{T}}[\mathsf{L}]$$

Since, in (1), the observed *X*<sup>0</sup> = 4 1 , the observed *u* is *u* = 2*x*<sup>1</sup> − 2*x*<sup>2</sup> − 4 = 8 − 2 − <sup>4</sup> <sup>=</sup> <sup>2</sup> *<sup>&</sup>gt;* 0 and we classify the observed *<sup>X</sup>*<sup>0</sup> into *<sup>π</sup>*<sup>1</sup> : *<sup>N</sup>*1*(* <sup>1</sup> <sup>2</sup>*Δ*2*, Δ*2*)*, the criterion being *A*<sup>1</sup> : *u* ≥ 0 and *A*<sup>2</sup> : *u <* 0*.* Thus,

*P (*1|1*, A)* = Probability of making a correct classification decision

$$\begin{split} \mathbf{f} &= \Pr\{\boldsymbol{u} \ge 0 | \boldsymbol{\pi}\_{1}\} = \int\_{0}^{\infty} \frac{\mathbf{e}^{-\frac{1}{2\Delta^{2}} (\boldsymbol{u} - \frac{1}{2}\Delta^{2})^{2}}}{\Delta\sqrt{(2\pi)}} \mathbf{d}\boldsymbol{u} = \int\_{-\frac{\Delta}{2}}^{\infty} \frac{\mathbf{e}^{-\frac{1}{2}v^{2}}}{\sqrt{(2\pi)}} \mathbf{d}v \\ &= \int\_{-1}^{\infty} \frac{\mathbf{e}^{-\frac{1}{2}v^{2}}}{\sqrt{(2\pi)}} \mathbf{d}v = \mathbf{0.5} + \int\_{0}^{1} \frac{\mathbf{e}^{-\frac{1}{2}v^{2}}}{\sqrt{(2\pi)}} \mathbf{d}v \approx \mathbf{0.841}; \end{split}$$

*P (*1|2*, A)* = Probability of misclassification

$$\mathcal{I} = \int\_0^\infty \frac{\mathbf{e}^{-\frac{1}{2\Delta^2}(u + \frac{1}{2}\Delta^2)^2}}{\Delta\sqrt{(2\pi)}} \mathrm{d}u = \int\_1^\infty \frac{\mathbf{e}^{-\frac{1}{2}v^2}}{\sqrt{(2\pi)}} \mathrm{d}v \approx 0.159.$$

When solving (2), the entire sample is to be classified. Proceeding as in the derivation of the criterion *u* in case (1), it is seen that for the problem at hand, *X*<sup>0</sup> will be replaced by *X*¯ , the average of the sample vectors or the sample mean value vector, and then *u* will become *u*<sup>1</sup> = 2*x*¯<sup>1</sup> − 2*x*¯<sup>2</sup> − 4 where *X*¯ -= [¯*x*1*, x*¯2]. Thus, we require the sample average:

$$
\begin{bmatrix} 4 \\ 2 \end{bmatrix} + \begin{bmatrix} 3 \\ 1 \end{bmatrix} + \begin{bmatrix} 5 \\ 1 \end{bmatrix} = \begin{bmatrix} 12 \\ 4 \end{bmatrix}
\Rightarrow \text{ observed sample mean } = \frac{1}{3} \begin{bmatrix} 12 \\ 4 \end{bmatrix}.
$$

This means that *<sup>x</sup>*¯<sup>1</sup> <sup>=</sup> <sup>12</sup> <sup>3</sup> <sup>=</sup> <sup>4</sup>*, <sup>x</sup>*¯<sup>2</sup> <sup>=</sup> <sup>4</sup> <sup>3</sup> , and the observed *<sup>u</sup>*<sup>1</sup> <sup>=</sup> <sup>2</sup>*x*¯1−2*x*¯2−<sup>4</sup> <sup>=</sup> <sup>8</sup><sup>−</sup> <sup>8</sup> <sup>3</sup> −4 *>* 0. Hence, we classify the whole sample to *π*<sup>1</sup> as the criterion is *A*<sup>1</sup> : *u*<sup>1</sup> ≥ 0 and *A*<sup>2</sup> : *u*<sup>1</sup> *<* 0. Since *<sup>X</sup>*¯ is normally distributed with *<sup>E</sup>*[*X*¯ ] = *μ(i)* and Cov*(X)*¯ <sup>=</sup> <sup>1</sup> *<sup>n</sup>Σ, i* = 1*,* 2*,* where *n* is the sample size, the densities of *u*<sup>1</sup> under *π*<sup>1</sup> and *π*<sup>2</sup> are the following:

$$\begin{aligned} \mu\_1|\pi\_1 &\sim N\_1(\frac{1}{2}\Delta^2, \frac{1}{3}\Delta^2), & n=3, \\ \mu\_1|\pi\_2 &\sim N\_1(-\frac{1}{2}\Delta^2, \frac{1}{3}\Delta^2). \end{aligned}$$

Moreover,

*P (*1|1*, A)* = Probability of making a correct classification decision

$$\begin{split} \mathcal{I} &= \Pr\{u\_{1} \ge 0 | \pi\_{1}\} = \int\_{0}^{\infty} \frac{\sqrt{3}}{\Delta\sqrt{(2\pi)}} \mathrm{e}^{-\frac{3}{2\Delta^{2}}(u\_{1} - \frac{1}{2}\Delta^{2})^{2}} \mathrm{d}u\_{1} \\ &= \int\_{-\frac{\sqrt{3}\Delta}{2}}^{\infty} \frac{\mathrm{e}^{-\frac{1}{2}v^{2}}}{\sqrt{(2\pi)}} \mathrm{d}v = \int\_{-\sqrt{3}}^{\infty} \frac{\mathrm{e}^{-\frac{1}{2}v^{2}}}{\sqrt{(2\pi)}} \mathrm{d}v = 0.5 + \int\_{0}^{\sqrt{3}} \frac{\mathrm{e}^{-\frac{1}{2}v^{2}}}{\sqrt{(2\pi)}} \mathrm{d}v \approx 0.958 \end{split}$$

and

*P (*1|2*, A)* = Probability of misclassification

$$\begin{split} &= \Pr\{u\_1 \ge 0 | \pi\_2\} = \int\_0^\infty \frac{\sqrt{3}}{\Delta \sqrt{(2\pi)}} \mathrm{e}^{-\frac{3}{2\Delta^2} (u\_1 + \frac{1}{2}\Delta^2)^2} \mathrm{d}u\_1 \\ &= \int\_{\sqrt{3}}^\infty \frac{\mathrm{e}^{-\frac{1}{2}v^2}}{\sqrt{(2\pi)}} \approx 0.042. \end{split}$$

#### **12.4. Linear Discriminant Function**

Let *X* be a *p* × 1 vector and *B* a *p* × 1 arbitrary constant vector, *B*- = *(b*1*,...,bp)*. Consider the arbitrary linear function *w* = *B*- *X*. Then, the mean value and variance of *w* are the following: *E(w)* = *B*- *E(X)* and Var*(w)* = Var*(B*- *X)* = *B*- Cov*(X)B* = *B*- *ΣB* where *Σ>O* is the covariance matrix of *X*. Suppose that the *X* could be from a *p*-variate real population *π*<sup>1</sup> with mean value vector *μ(*1*)* or from the *p*-variate real population *π*<sup>2</sup> with mean value vector *μ(*2*)* . Suppose that both the populations *π*<sup>1</sup> and *π*<sup>2</sup> have the same covariance matrix *Σ>O*. Then, a measure of discrimination or separation between *π*<sup>1</sup> and *π*<sup>2</sup> is |*B*- *μ(*1*)* <sup>−</sup> *<sup>B</sup>*- *μ(*2*)* <sup>|</sup> as measured in terms of the standard deviation <sup>√</sup>Var*(w)* for determining the best choice of *B*. Taking the squared distance, let

$$\delta = \frac{[B'\mu^{(1)} - B'\mu^{(2)}]^2}{B'\Sigma B} = \frac{[B'(\mu^{(1)} - \mu^{(2)})^2]}{B'\Sigma B} = \frac{B'(\mu^{(1)} - \mu^{(2)})(\mu^{(1)} - \mu^{(2)})'B}{B'\Sigma B} \tag{12.4.1}$$

since the square of a scalar quantity is the scalar quantity times its transpose, *B*- *(μ(*1*)* <sup>−</sup> *μ(*2*) )* being a scalar quantity. Accordingly, we will maximize *δ* as specified in (12.4.1). This will be achieved by selecting a particular *B* in such a way that *δ* attains a maximum which corresponds to the maximum distance between *π*<sup>1</sup> and *π*2. Without any loss of generality, we may assume that *B*- *ΣB* = 1*,* so that only the numerator in (12.4.1) need be maximized, subject to the condition *B*- *ΣB* = 1. Let *λ* denote a Lagrangian multiplier and

$$
\eta = \mathcal{B}'(\mu^{(1)} - \mu^{(2)}) (\mu^{(1)} - \mu^{(2)})' \mathcal{B} - \lambda (\mathcal{B}' \Sigma \mathcal{B} - 1).
$$

Let us take the partial derivative of *η* with respect to the vector *B* and equate the result to a null vector (the reader may refer to Chap. 1 for the derivative of a scalar variable with respect to a vector variable):

$$\frac{\partial \eta}{\partial B} = O \Rightarrow 2(\mu^{(1)} - \mu^{(2)})(\mu^{(1)} - \mu^{(2)})'B - 2\lambda \Sigma B = O$$

$$\Rightarrow \Sigma^{-1} (\mu^{(1)} - \mu^{(2)})(\mu^{(1)} - \mu^{(2)})'B = \lambda B. \tag{i}$$

Note that *(μ(*1*)* <sup>−</sup> *μ(*2*) )*- *B* ≡ *α* is a scalar quantity and *B* is a specific vector coming from *(i)* and hence we may write *(i)* as

$$\mathcal{B} = \frac{\alpha}{\lambda} \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) \equiv c \, \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) \tag{ii}$$

where *c* is a real scalar quantity. Observe that *δ* as given in (12.4.1) will remain the same if *B* is multiplied by any scalar quantity. Thus, we may take *c* = 1 in *(ii)* without any loss of generality. The linear discriminant function then becomes

$$B'X = (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X,\tag{12.4.2}$$

and when *B*- *X* is as given in (12.4.2), *δ* as defined in (12.4.1), can be expressed as follows:

$$\delta = \frac{(\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} - \mu^{(2)})}{(\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} - \mu^{(2)})} $$

$$= (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) = \Delta^2 \equiv \text{Mahalanois' distance} $$

$$= \text{Var}(w) = \text{ Variance of the discriminant function}. \tag{12.4.3}$$

This *δ* is also the generalized squared distance between the vectors *μ(*1*)* and *μ(*2*)* or the squared distance between the vectors *Σ*−<sup>1</sup> <sup>2</sup>*μ(*1*)* and *Σ*−<sup>1</sup> <sup>2</sup>*μ(*2*)* in the mathematical sense (Euclidean distance). Hence Mahalanobis' distance between two *p*-variate populations with different mean value vectors and the same covariance matrix is a measure of discrimination or separation between the populations, and the linear discriminant function is given in (12.4.2). Hence for an observed value *<sup>X</sup>*, if *<sup>u</sup>* <sup>=</sup> *(μ(*1*)* <sup>−</sup> *μ(*2*) )*- *Σ*−1*X >* 0 when *μ(*1*) , μ(*2*)* and *Σ* are known, then we choose population *π*<sup>1</sup> with mean value *μ(*1*) ,* and if *u <* 0*,* then we select population *π*<sup>2</sup> with mean value *μ(*2*)* . When *u* = 0*,* both *π*<sup>1</sup> and *π*<sup>2</sup> are equally favored.

**Example 12.4.1.** In a small township, there is only one grocery store. The town is laid out on the East and West sides of the sole main road. We will refer to the villagers as Eastenders and West-enders. These townspeople shop only once a week for groceries. The grocery store owner found that the East-enders and West-enders have somewhat different buying habits. Consider the following items: *x*<sup>1</sup> = grain items in kilograms, *x*<sup>2</sup> = vegetable items in kilograms, *x*<sup>3</sup> = dairy products in kilograms, and let [*x*1*, x*2*, x*3] = *X* where X is the vector of weekly purchases. Then, the expected quantities bought by the East-enders and West-enders are *E(X)* <sup>=</sup> *μ(*1*)* and *E(X)* <sup>=</sup> *μ(*2*) ,* respectively, with the common covariance matrix *Σ>O*. From past history, the grocery store owner determined that

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \; \mu^{(1)} = \begin{bmatrix} 2 \\ 3 \\ 1 \end{bmatrix}, \; \mu^{(2)} = \begin{bmatrix} 1 \\ 3 \\ 2 \end{bmatrix}, \; \Sigma = \begin{bmatrix} 3 & 0 & 0 \\ 0 & 2 & -1 \\ 0 & -1 & 1 \end{bmatrix}.$$

Consider the following situations: (1) A customer walked in and bought *x*<sup>1</sup> = 1 kg of grain items, *x*<sup>2</sup> = 2 kg of vegetable items, and *x*<sup>3</sup> = 1 kg of dairy products. Is she likely to be an East-ender or West-ender? (2): Another customer bought the three types of items in the quantities *(*10*,* 1*,* 1*),* respectively. Is she more likely to be an East-ender than a Westender?

**Solution 12.4.1.** The inverse of the covariance matrix, *<sup>μ</sup>(*1*)* <sup>−</sup> *μ(*2*) ,* as well as other relevant quantities are the following:

$$\begin{aligned} \boldsymbol{\Sigma}^{-1} &= \begin{bmatrix} \frac{1}{3} & 0 & 0\\ 0 & 1 & 1\\ 0 & 1 & 2 \end{bmatrix}, \ \boldsymbol{\mu}^{(1)} - \boldsymbol{\mu}^{(2)} = \begin{bmatrix} 2\\ 3\\ 1 \end{bmatrix} - \begin{bmatrix} 1\\ 3\\ 2 \end{bmatrix} = \begin{bmatrix} 1\\ 0\\ -1 \end{bmatrix},\\ (\boldsymbol{\mu}^{(1)} - \boldsymbol{\mu}^{(2)})' \boldsymbol{\Sigma}^{-1} &= [1, 0, -1] \begin{bmatrix} \frac{1}{3} & 0 & 0\\ 0 & 1 & 1\\ 0 & 1 & 2 \end{bmatrix} = [\frac{1}{3}, -1, -2]. \end{aligned}$$

In (1), *X*-= *(*1*,* 2*,* 1*)* and since

$$(\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X = [\frac{1}{3}, -1, -2] \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix} < 0,$$

we classify this customer as a West-ender from her buying pattern. In (2),

$$(\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X = [\frac{1}{3}, -1, -2] \begin{bmatrix} 10 \\ 1 \\ 1 \end{bmatrix} > 0,$$

so that, given her purchases, this customer is classified as an East-ender.

#### **12.5. Classification When the Population Parameters are Unknown**

We now consider the classification problem involving two populations *π*<sup>1</sup> and *π*<sup>2</sup> for which the parameters of the corresponding densities are unknown. Since the structure of the parameters in these general densities *P*1*(X)* and *P*2*(X)* is not known, we will present a specific example: Consider the two *p*-variate normal populations of Example 12.3.3. Let *<sup>π</sup>*<sup>1</sup> : *Np(μ(*1*) ,Σ)* and *<sup>π</sup>*<sup>2</sup> : *Np(μ(*2*) , Σ),* which share the same positive definite covariance matrix *Σ*. Suppose that we have a single observation vector *X* to be classified into *π*<sup>1</sup> or *π*2. When the parameters *μ(*1*) , μ(*2*)* and *Σ* are unknown, we will have to estimate them from some training samples. But, for a problem such as classifying skeletal remains, one does not have samples from the respective ancestral groups. Nevertheless, one can obtain training samples from living racial groups, and so, secure estimates of the parameters involved. Assume that we have simple random samples of sizes *n*<sup>1</sup> and *n*<sup>2</sup> from *Np(μ(*1*) ,Σ)* and *Np(μ(*2*) , Σ),* respectively. Denote the sample values by *X(*1*)* <sup>1</sup> *,...,X(*1*) <sup>n</sup>*<sup>1</sup> *,* and *X(*2*)* <sup>1</sup> *,...,X(*2*) <sup>n</sup>*<sup>2</sup> , and let *<sup>X</sup>*¯ *(*1*)* and *<sup>X</sup>*¯ *(*2*)* be the sample averages. That is,

$$X\_j^{(1)} = \begin{bmatrix} x\_{1j}^{(1)} \\ \vdots \\ x\_{pj}^{(1)} \end{bmatrix}, \ j = 1, \dots, n\_1; \ \ X\_j^{(2)} = \begin{bmatrix} x\_{1j}^{(2)} \\ \vdots \\ x\_{pj}^{(2)} \end{bmatrix}, \ j = 1, \dots, n\_2;$$

$$\bar{X}^{(1)} = \begin{bmatrix} \bar{x}\_1^{(1)} \\ \vdots \\ \bar{x}\_p^{(1)} \end{bmatrix}, \ \bar{x}\_k^{(1)} = \frac{1}{n\_1} \sum\_{j=1}^{n\_1} x\_{kj}^{(1)}; \ \bar{X}^{(2)} = \begin{bmatrix} \bar{x}\_1^{(2)} \\ \vdots \\ \bar{x}\_p^{(2)} \end{bmatrix}, \ \bar{x}\_k^{(2)} = \frac{1}{n\_2} \sum\_{j=1}^{n\_2} x\_{kj}^{(2)}. \tag{12.5.1}$$

Let the sample matrices be denoted by bold-faced letters where the *<sup>p</sup>* <sup>×</sup>*n*<sup>1</sup> matrix **<sup>X</sup>***(***1***)* and the *<sup>p</sup>* <sup>×</sup> *<sup>n</sup>*<sup>2</sup> matrix **<sup>X</sup>***(***2***)* are the sample matrices and let **<sup>X</sup>**¯ *(***1***)* and **<sup>X</sup>**¯ *(***2***)* be the matrices of sample means. Thus, we have

$$\begin{aligned} \mathbf{X}^{(1)} &= [X\_1^{(1)}, \dots, X\_{n\_1}^{(1)}] = \begin{bmatrix} x\_{11}^{(1)} & \dots & x\_{1n\_1}^{(1)} \\ \vdots & \ddots & \vdots \\ x\_{p1}^{(1)} & \dots & x\_{pn\_1}^{(1)} \end{bmatrix}, \\ \bar{\mathbf{X}}^{(1)} &= [\bar{X}^{(1)}, \dots, \bar{X}^{(1)}] = \begin{bmatrix} \bar{x}\_1^{(1)} & \dots & \bar{x}\_1^{(1)} \\ \vdots & \ddots & \vdots \\ \bar{x}\_p^{(1)} & \dots & \bar{x}\_p^{(1)} \end{bmatrix}, \end{aligned}$$

$$\begin{aligned} \mathbf{X}^{(2)} &= [\mathbf{X}\_1^{(2)}, \dots, \mathbf{X}\_{n\_2}^{(2)}] = \begin{bmatrix} x\_{11}^{(2)} & \dots & x\_{1n\_2}^{(2)} \\ \vdots & \ddots & \vdots \\ x\_{p1}^{(2)} & \dots & x\_{pn\_2}^{(2)} \end{bmatrix}, \\\ \bar{\mathbf{X}}^{(2)} &= [\bar{\mathbf{X}}^{(2)}, \dots, \bar{\mathbf{X}}^{(2)}] = \begin{bmatrix} \bar{\bar{\mathbf{x}}}\_1^{(2)} & \dots & \bar{\mathbf{x}}\_1^{(2)} \\ \vdots & \ddots & \vdots \\ \bar{\bar{\mathbf{x}}}\_p^{(2)} & \dots & \bar{\bar{\mathbf{x}}}\_p^{(2)} \end{bmatrix}. \end{aligned} \tag{12.5.2}$$

Then, the sample sum of products matrices are

$$\begin{aligned} S\_i &= (\mathbf{X}^{(\mathbf{i})} - \bar{\mathbf{X}}^{(\mathbf{i})})(\mathbf{X}^{(\mathbf{i})} - \bar{\mathbf{X}}^{(\mathbf{i})})', \; i = 1, 2; \\ S\_m &= (s\_{ij}^{(m)}), \; s\_{ij}^{(m)} = \sum\_{k=1}^{n\_m} (\mathbf{x}\_{ik}^{(m)} - \bar{\mathbf{x}}\_i^{(m)})(\mathbf{x}\_{jk}^{(m)} - \bar{\mathbf{x}}\_j^{(m)}), \; m = 1, 2, \; S = S\_1 + S\_2. \end{aligned} \tag{12.5.3}$$

The unbiased estimators of *μ(*1*) , μ(*2*)* and *Σ* are respectively *X*¯ *(*1*) , X*¯ *(*2*)* and *<sup>S</sup> n(*2*)* = *S*1+*S*<sup>2</sup> *n(*2*) , n(*2*)* = *n*<sup>1</sup> + *n*<sup>2</sup> − 2. The criteria for classification, the regions, the statistic, and so on, are available from Example 12.3.3. That is,

$$A\_1: u \ge k,\ A\_2: u < k,\ k = \ln \frac{C(1|2)q\_2}{C(2|1)q\_1},$$

where

$$
\mu = X^\prime \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) - \frac{1}{2} (\mu^{(1)} - \mu^{(2)})^\prime \Sigma^{-1} (\mu^{(1)} + \mu^{(2)}).
$$

Note that *q*<sup>1</sup> and *q*<sup>2</sup> are the prior probabilities of selecting the populations *π*<sup>1</sup> and *π*<sup>2</sup> and *C(*1|2*)* and *C(*2|1*)* are the costs or loss associated with misclassification. We will assume that *<sup>q</sup>*1*, q*2*, C(*1|2*)* and *C(*2|1*)* are all known but the parameters *μ(*1*) , μ(*2*)* and *Σ* are estimated by their unbiased estimators. Denoting the estimator of *u* as *v*, we obtain the following criterion, assuming that we have one *p*-vector *X* to be classified into *π*<sup>1</sup> or *π*2:

$$\begin{split} A\_1: v &\geq k, \ A\_2: v < k, \ k = \ln \frac{q\_2 C(1|2)}{q\_1 C(2|1)}, \\ v &= n\_{(2)} X' S^{-1} (\bar{X}^{(1)} - \bar{X}^{(2)}) - n\_{(2)} \frac{1}{2} (\bar{X}^{(1)} - \bar{X}^{(2)})' S^{-1} (\bar{X}^{(1)} + \bar{X}^{(2)}) \\ &= n\_{(2)} [X - \frac{1}{2} (\bar{X}^{(1)} + \bar{X}^{(2)})' S^{-1} (\bar{X}^{(1)} - \bar{X}^{(2)}). \end{split} \tag{12.5.4}$$

As it turns out, it already proves quite challenging to obtain the exact distribution of *v* as given in (12.5.4) where *X* is a single *p*-vector either from *π*<sup>1</sup> or from *π*2.

#### **12.5.1. Some asymptotic results**

Before considering asymptotic properties of *u* and *v* as defined in Sect. 12.4, let us recall certain results obtained in earlier chapters. Let the *p* × 1 vectors *Yj , j* = 1*, . . . , n,* be iid vectors from some population for which *E*[*Yj* ] = *μ* and Cov*(Yj )* = *Σ > O, j* = 1*,...,n*. Let the sample matrix, the matrix of sample means wherein the sample mean *<sup>Y</sup>*¯ <sup>=</sup> <sup>1</sup> *n <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *Yj* and the sample sum of products matrix *<sup>S</sup>* be the as follows:

$$\mathbf{Y} = [Y\_1, \dots, Y\_n], \ \bar{\mathbf{Y}} = [\bar{Y}, \dots, \bar{Y}], \ S = (\mathbf{s}\_{ij}), \ s\_{ij} = \sum\_{k=1}^n (\mathbf{y}\_{ik} - \bar{\mathbf{y}}\_i)(\mathbf{y}\_{jk} - \bar{\mathbf{y}}\_j),$$

$$\mathbf{S} = [\mathbf{Y} - \bar{\mathbf{Y}}][\mathbf{Y} - \bar{\mathbf{Y}}]' = \mathbf{Y} \begin{bmatrix} I\_n - JJ'/n \end{bmatrix} \mathbf{Y}, \ Y\_j' = [\mathbf{y}\_{1j}, \mathbf{y}\_{2j}, \dots, \mathbf{y}\_{pj}], \tag{i}$$

where *J* is a *n* × 1 vector of unities. Since a matrix of the form **Y** − **Y**¯ is present, we may let *μ* = *O* without any loss of generality in the following computations since *Yj* − *Y*¯ = *(Yj* − *μ)* − *(Y*¯ − *μ)*. Note that *B* = *B*- <sup>=</sup> *In* <sup>−</sup> <sup>1</sup> *<sup>n</sup>J J* - <sup>=</sup> *<sup>B</sup>*<sup>2</sup> and hence, *<sup>B</sup>* is idempotent and of rank *n* − 1. Since *B* = *B*- *,* there exists an orthonormal matrix *Q* such that *Q*- *BQ* = diag*(*1*,...,* 1*,* 0*)* = *D, QQ*- = *I, Q*- *Q* = *I,* the diagonal elements being 1's and 0 since *<sup>B</sup>* <sup>=</sup> *<sup>B</sup>*<sup>2</sup> and of rank *<sup>n</sup>* <sup>−</sup> 1. Then,

$$\begin{aligned} S &= \mathbf{Y} \, \mathcal{Q} \, \text{diag}(1, \dots, 1, 0) \, \mathcal{Q}' \mathbf{Y}' = \mathbf{Y} \, \mathcal{Q} \, \text{D}D' \mathbf{Q}' \mathbf{Y}', \\ D &= \text{diag}(1, \dots, 1, 0). \end{aligned} \tag{ii}$$

Consider *Σ*−<sup>1</sup> <sup>2</sup> *SΣ*−<sup>1</sup> <sup>2</sup> . Let *Uj* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup> *Yj , j* = 1*, . . . , n,* where *Yj* is the *j* -th column of **Y** and it is assumed that *μ* = *O*. Observe that *E*[*Uj* ] = *O,*Cov*(Uj )* = *Ip, j* = 1*, . . . , n,* and the *Uj* 's are uncorrelated. Letting **U** = [*U*1*,...,Un*], *(ii)* implies that

$$
\Sigma^{-\frac{1}{2}} S \Sigma^{-\frac{1}{2}} = \mathbf{U}Q \mathbf{D} \mathbf{D} Q' \mathbf{U}'.\tag{iiii}
$$

Denoting by *U(j )* the *j* -th row of **U**, it follows that the elements of *U(j )* are iid uncorrelated real scalar variables with mean value zero and variance 1. Consider the transformation *V(j )* = *U(j )Q*; then *E*[*V(j )*] = *O* and Cov[*V(j )*] = *In, j* = 1*,... , p,* the *V(j )*'s being the uncorrelated. Let **V** be the *p* × *n* matrix whose rows are *V(j ), j* = 1*,...,p*. Let the columns of **V** be *Vj , j* = 1*, . . . , n,* that is, **V** = [*V*1*,...,Vn*]*.* Then, *(iii)* implies the following:

$$\begin{aligned} \boldsymbol{\Sigma}^{-\frac{1}{2}} S \boldsymbol{\Sigma}^{-\frac{1}{2}} &= \mathbf{V} D \boldsymbol{D}^{\prime} \mathbf{V}^{\prime} = \{ [V\_1, \dots, V\_n] D \} \{ [V\_1, \dots, V\_n] D \}^{\prime} \\ &= [V\_1, \dots, V\_{n-1}, O] [V\_1, \dots, V\_{n-1}, O]^{\prime} = V\_1 V\_1^{\prime} + \dots + V\_{n-1} V\_{n-1}^{\prime} \Rightarrow \\ E \{ \boldsymbol{\Sigma}^{-\frac{1}{2}} S \boldsymbol{\Sigma}^{-\frac{1}{2}} \} &= E [V\_1 V\_1^{\prime}] + \dots + E [V\_{n-1} V\_{n-1}^{\prime}] = I\_p + \dots + I\_p = (n-1) I\_p \Rightarrow \\ E [S] &= (n-1) \boldsymbol{\Sigma} \text{ or } E \left[ \frac{S}{n-1} \right] = \boldsymbol{\Sigma} . \end{aligned}$$

Additionally,

$$\begin{aligned} \text{Cov}(\bar{Y}) &= \frac{1}{n^2} \text{Cov}[Y\_1 + \dots + Y\_n] = \frac{1}{n^2} [\text{Cov}(Y\_1) + \dots + \text{Cov}(Y\_n)] \\ &= \frac{1}{n^2} [\Sigma + \dots + \Sigma] = \frac{n}{n^2} \Sigma = \frac{\Sigma}{n} \to O \text{ as } n \to \infty, \end{aligned} \tag{v}$$

when *Σ* is finite with respect to any norm of *Σ*, namely *Σ <* ∞. Appealing to the extended Chebyshev inequality, this shows that the unbiased estimator of *μ*, namely *Y*¯, converges to *μ* in probability, that is,

$$\Pr(\bar{Y} \to \mu) \to 1 \text{ when } n \to \infty \text{ or } \lim\_{n \to \infty} \Pr(\bar{Y} \to \mu) = 1. \tag{\text{vì}}$$

An unbiased estimator of *<sup>Σ</sup>* is *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>S</sup> <sup>n</sup>*−<sup>1</sup> with *<sup>E</sup>*[*Σ*<sup>ˆ</sup> ] = *<sup>Σ</sup>*. Will *<sup>Σ</sup>*<sup>ˆ</sup> also converge to *<sup>Σ</sup>* in probability when *n* → ∞? In order to establish this, we require the covariance structure of the elements in *S*. For arbitrary populations, it is somewhat difficult to verify this result; however, it is rather straightforward for normal populations. We will examine this aspect next.

#### **12.5.2. Another method**

Let the *p* × 1 vectors *Xj , j* = 1*, . . . , n,* be a simple random sample of size *n* from a population having a real *Np(μ, Σ), Σ > O,* distribution. Letting *S* denote the sample sum of products matrix, *S* will be distributed as a Wishart matrix with *m* = *n* − 1 degrees of freedom and *Σ>O* as its parameter matrix, whose density is

$$f(\mathcal{S}) = \frac{1}{2^{\frac{mp}{2}} |\Sigma|^{\frac{m}{2}} \Gamma\_p(\frac{m}{2})} |\mathcal{S}|^{\frac{m}{2} - \frac{p+1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(\Sigma^{-1} \mathcal{S})}, \ \mathcal{S} > O, \ m \ge p; \tag{i}$$

the reader may also refer to real matrix-variate gamma density discussed in Chap. 5. This is usually written as *<sup>S</sup>* <sup>∼</sup> *Wp(m, Σ), Σ > O*. Letting *S(*∗*)* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> <sup>2</sup> *SΣ*−<sup>1</sup> <sup>2</sup> , *S(*∗*)* ∼ *Wp(m, I )*. Consider the transformation *S(*∗*)* = *T T* where *T* = *(tij )* is a lower triangular matrix whose diagonal elements are positive, that is, *tij* = 0*, i < j,* and *tjj >* 0*, j* = 1*,...,p*. It was explained in Chaps. 1 and 3 that the *tij* 's are mutually independently distributed with the *tij* 's such that *i>j* distributed as standard normal variables and *t*<sup>2</sup> *jj ,* as a chisquare variable having *m* − *(j* − 1*)* degrees of freedom. The *j* -th diagonal element of *T T* is of the form *t* 2 *<sup>j</sup>*<sup>1</sup> +···+ *<sup>t</sup>*<sup>2</sup> *jj*−<sup>1</sup> <sup>+</sup> *<sup>t</sup>*<sup>2</sup> *jj* where *<sup>t</sup>*<sup>2</sup> *jk* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> , for *k* = 1*,...,j* − 1, that is, the square of a real standard normal variable. Thus, the *j* -th diagonal element is distributed as *χ*<sup>2</sup> 1 + ···+ *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> <sup>+</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>m</sup>*−*(j*−1*)* <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>m</sup>* since all the individual chisquare variables are independently distributed, in which case the resulting number of degrees of freedom is the sum of the degrees of freedom of the chisquares. Now, noting that for a *χ*<sup>2</sup> *ν* ,

$$E[\chi^2\_\nu] = \nu \text{ and } \text{Var}(\chi^2\_\nu) = 2\,\,\nu,\tag{ii}$$

the expected value of each of the diagonal elements in *T T* - , which are the diagonal elements in *S(*∗*)*, will be *m* = *n* − 1. The non-diagonal elements in *T T* result from a sum of terms of the form *tiktii,k<i*, whose expected value is *E*[*tiktii*] = *E*[*tik*]*E*[*tjj* ]; but since *E*[*tik*] = 0*, i>k*, all the non-diagonal elements will have zero as their expected values. Accordingly,

$$E\left[S\_{(\ast)}\right] = \text{diag}(m, \ldots, m) \Rightarrow E\left[\frac{S\_{(\ast)}}{m}\right] = I \Rightarrow E\left[\frac{S}{m}\right] = \Sigma \,, \, m = n - 1,\qquad(iii)$$

and the estimator *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>S</sup> <sup>m</sup>* is unbiased for *Σ*, *m* being equal to *n* − 1. Now, let us examine the covariance structure of *S(*∗*)*. Let *W* denote a single vector comprising all the distinct elements of *S(*∗*)* = *T T* and consider its covariance structure. In this vector of order *p(p*+1*)* <sup>2</sup> ×1, convert all the original *tij* 's and *tjj* 's in terms of standard normal and chisquare variables. Let *z*1*,...,z p(p*−1*)* 2 be the standard normal variables and *y*1*,...,yp* denote the chisquare variables. Then, each element of Cov*(W )* = [*W* − *E(W )*][*W* − *E(W )*] will be a sum of terms of the type

$$[\text{Var}(\mathbf{y}\_k)][\text{Var}(z\_j)] = \text{Var}(\mathbf{y}\_k) = \text{[twice the number of degrees of freedom of } \mathbf{y}\_k], \quad (\dot{\mathbf{w}})$$

which happens to be a linear function of *<sup>m</sup>*. Our estimator being *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>S</sup> <sup>m</sup>* = *Σ* 1 2 *S(*∗*) <sup>m</sup> Σ* 1 2 , the covariance structure of *S(*∗*) <sup>m</sup>* which is <sup>1</sup> *<sup>m</sup>*2Cov*(W )* tends to *O* when *m* → ∞, since each element of Cov*(W )* is of the form *a m* <sup>+</sup> *<sup>b</sup>* where *<sup>a</sup>* and *<sup>b</sup>* are real scalars, so that *a m*+*<sup>b</sup> <sup>m</sup>*<sup>2</sup> → 0 as *m* → ∞, or equivalently, as *n* → ∞ since *m* = *n* − 1. Thus, it follows from an extended version of Chebyshev's inequality that

Classification Problems

$$\Pr\left(\frac{S}{m}\to\Sigma\right)\to 1 \text{ as } m\to\infty \text{ or as } n\to\infty \text{ since } m=n-1.\tag{\text{v}}$$

These last two results are stated next as a theorem.

**Theorem 12.5.1.** *Let the p* × 1 *vectors Xj , j* = 1*, . . . , n, be iid with E*[*Xj* ] = *μ and* Cov*(Xj )* = *Σ, j* = 1*,...,n. Assume that Σ is finite in the sense that Σ <* ∞*. Then, letting <sup>X</sup>*¯ <sup>=</sup> <sup>1</sup> *n <sup>n</sup> <sup>j</sup>*=<sup>1</sup> *Xj denote the sample mean,*

$$\Pr(\bar{X}\to\mu)\to 1\text{ as }n\to\infty.\tag{12.5.5}$$

*Further, letting Xj* ∼ *Np(μ, Σ), Σ > O,*

$$\Pr\left(\hat{\Sigma} = \frac{\mathcal{S}}{m} \to \Sigma\right) \to 1 \text{ as } m \to \infty \text{ or as } n \to \infty \text{ since } m = n - 1. \tag{12.5.6}$$

Let us now examine the criterion in (12.5.4). In this case, we can obtain an asymptotic distribution of the criterion *v* for large *n(*2*)* or when *n(*2*)* → ∞ in the sense that *n*<sup>1</sup> → ∞ and *<sup>n</sup>*<sup>2</sup> → ∞. When *n(*2*)* → ∞, we have *<sup>X</sup>*¯ *(*1*)* <sup>→</sup> *μ(*1*) , <sup>X</sup>*¯ *(*2*)* <sup>→</sup> *μ(*2*)* and *<sup>S</sup> n(*2*)* → *Σ*, so that the criterion *v* in (12.5.4) becomes

$$\begin{split} u &= (\mu^{(\rm 1)} - \mu^{(2)})' \Sigma^{-1} X - \frac{1}{2} (\mu^{(\rm 1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(\rm 1)} + \mu^{(2)}) \\ &= [X - \frac{1}{2} (\mu^{(\rm 1)} - \mu^{(2)})]' \Sigma^{-1} (\mu^{(\rm 1)} - \mu^{(2)}), \end{split} \tag{12.5.7}$$

which is nothing but *u* as specified in (12.3.7) with the densities *N*1*(* <sup>1</sup> <sup>2</sup>*Δ*2*, Δ*2*)* in *<sup>π</sup>*<sup>1</sup> and *<sup>N</sup>*1*(*−<sup>1</sup> <sup>2</sup>*Δ*2*, Δ*2*)* in *<sup>π</sup>*2. Hence, the following result:

**Theorem 12.5.2.** *When n*<sup>1</sup> → ∞ *and n*<sup>2</sup> → ∞*, the criterion v provided in (12.5.4) becomes u as specified in (12.5.7) with the univariate normal densities N*1*(* <sup>1</sup> <sup>2</sup>*Δ*2*, Δ*2*) in <sup>π</sup>*<sup>1</sup> *and <sup>N</sup>*1*(*−<sup>1</sup> <sup>2</sup>*Δ*2*, Δ*2*) in <sup>π</sup>*2*, where <sup>Δ</sup>*<sup>2</sup> *is Mahalanobis' distance given in (12.3.8). We classify X, the observation vector at hand, to π*<sup>1</sup> *when X* ∈ *A*<sup>1</sup> *and, to π*<sup>2</sup> *when X* ∈ *A*<sup>2</sup> *where <sup>A</sup>*<sup>1</sup> : *<sup>u</sup>* <sup>≥</sup> *<sup>k</sup> and <sup>A</sup>*<sup>2</sup> : *u<k with <sup>k</sup>* <sup>=</sup> ln *C(*1|2*) q*<sup>2</sup> *C(*2|1*) q*<sup>1</sup> *, q*<sup>1</sup> *and q*<sup>2</sup> *being the prior probabilities of selecting the populations π*<sup>1</sup> *and π*2*, respectively, and C(*2|1*) and C(*1|2*) denoting the costs or loss associated with misclassification.*

In a practical situation, when *n*<sup>1</sup> and *n*<sup>2</sup> are large, we may replace *Δ*<sup>2</sup> in Theorem 12.5.2 by the corresponding sample value *n(*2*)(X*¯ *(*1*)* <sup>−</sup>*X*¯ *(*2*) )*- *<sup>S</sup>*−1*(X*¯ *(*1*)* <sup>−</sup>*X*¯ *(*2*) )* where *S* = *S*<sup>1</sup> +*S*<sup>2</sup> and *n(*2*)* = *n*<sup>1</sup> + *n*<sup>2</sup> − 2 and utilize the criterion *u* as specified in (12.5.7) to classify the given vector *X* into *π*<sup>1</sup> and *π*2. It is assumed that *q*1*, q*2*, C(*2|1*)* and *C(*1|2*)* are available.

### **12.5.3. A new sample from** *π*<sup>1</sup> **or** *π*<sup>2</sup>

As in Examples 12.3.1 and 12.3.2, suppose that a simple random sample of size *n*<sup>3</sup> is available either from *<sup>π</sup>*<sup>1</sup> : *Np(μ(*1*) ,Σ)* or from *<sup>π</sup>*<sup>2</sup> : *Np(μ(*2*) , Σ), Σ > O*. Letting the new sample be *X(*3*)* <sup>1</sup> *,...,X(*3*) <sup>n</sup>*<sup>3</sup> , the *<sup>p</sup>* <sup>×</sup> *<sup>n</sup>*<sup>3</sup> sample matrix, the sample mean *<sup>X</sup>*¯ *(*3*)* <sup>=</sup> 1 *n*3 *n*<sup>3</sup> *<sup>j</sup>*=<sup>1</sup> *<sup>X</sup>(*3*) <sup>j</sup>* , the *p* × *n*<sup>3</sup> matrix of sample means and the sample sum of products matrix are the following:

$$\mathbf{X}^{(3)} = [X\_1^{(3)}, \dots, X\_{n\_3}^{(3)}], \ \bar{\mathbf{X}}^{(3)} = [\bar{X}^{(3)}, \bar{X}^{(3)}, \dots, \bar{X}^{(3)}],$$

$$\mathcal{S}\_3 = [\mathbf{X}^{(3)} - \bar{\mathbf{X}}^{(3)}][\mathbf{X}^{(3)} - \bar{\mathbf{X}}^{(3)}]' = (s\_{ij}^{(3)}),$$

$$s\_{ij}^{(3)} = \sum\_{k=1}^{n\_3} (\mathbf{x}\_{ik}^{(3)} - \bar{\mathbf{x}}\_{i}^{(3)})(\mathbf{x}\_{jk}^{(3)} - \bar{\mathbf{x}}\_{j}^{(3)}).\tag{12.5.8}$$

An unbiased estimate from this third sample is *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>S</sup>*<sup>3</sup> *<sup>n</sup>*3−<sup>1</sup> *,* as *<sup>E</sup>*[*Σ*<sup>ˆ</sup> ] = *<sup>Σ</sup>*. A pooled estimate of *Σ* obtained from the three samples is

$$\frac{S\_1 + S\_2 + S\_3}{n\_1 + n\_2 + n\_3 - 3} \equiv \frac{S}{n\_{(3)}}, \ S = S\_1 + S\_2 + S\_3, \ n\_{(3)} = n\_1 + n\_2 + n\_3 - 3. \tag{12.5.9}$$

Then, the criterion corresponding to (12.3.4) changes to:

$$A\_1 \colon w \ge k \text{ and } A\_2 \colon w < k, \; k = \ln \frac{C(1|2)}{C(2|1)} \frac{q\_2}{q\_1},\tag{12.5.10}$$

where

$$w = n\_{(3)}[\bar{X}^{(3)} - \frac{1}{2}(\bar{X}^{(1)} + \bar{X}^{(2)})]'S^{-1}(\bar{X}^{(1)} - \bar{X}^{(2)})\tag{12.5.11}$$

with *<sup>S</sup>* <sup>=</sup> *<sup>S</sup>*1+*S*2+*S*3*, n(*3*)* <sup>=</sup> *<sup>n</sup>*1+*n*2+*n*3−3 and *<sup>X</sup>*¯ *(*3*)* being the sample average from the third sample, which either comes from *<sup>π</sup>*<sup>1</sup> : *Np(μ(*1*) ,Σ)* or *<sup>π</sup>*<sup>2</sup> : *Np(μ(*2*) , Σ), Σ > O*. Thus, the classification rule is the following:

$$A\_1 \colon w \ge k \text{ and } A\_2 \colon w < k, \; k = \ln \frac{C(1|2)}{C(2|1)} \frac{q\_2}{q\_1},\tag{12.5.12}$$

*w* being as defined in (12.5.11). That is, classify the new sample into *π*<sup>1</sup> if *w* ≥ *k* and, into *π*<sup>2</sup> if *w<k*.

As was explained in Sect. 12.5.2, as *nj* → ∞*, j* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*, <sup>X</sup>*¯ *(i)* <sup>→</sup> *μ(i), i* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,* and although *n*<sup>3</sup> usually remains finite, as *n*<sup>1</sup> → ∞ and *n*<sup>2</sup> → ∞, we have *n(*3*)* → ∞ and

#### Classification Problems

*S n(*3*)* → *Σ*. Accordingly, the criterion *w* as given in (12.5.11) converges to *w*<sup>1</sup> for large values of *n*<sup>1</sup> and *n*2, where

$$w\_{\rm l} = \left[\bar{X}^{(3)} - \frac{1}{2}(\mu^{(1)} + \mu^{(2)})\right]'\Sigma^{-1}(\mu^{(1)} - \mu^{(2)}).\tag{12.5.13}$$

Compared to *u* as specified in (12.3.7), the only difference is that *X* associated with *u* is replaced by *X*¯ *(*3*)* in *w*1. Hence, the variance in *u* will be multiplied by <sup>1</sup> *n*3 , and the asymptotic distributions will be as follows:

$$w\_1|\pi\_1 \sim N\_1(\frac{1}{2}\Delta^2, \frac{1}{n\_3}\Delta^2) \text{ and } \; w\_1|\pi\_2 \sim N\_1(-\frac{1}{2}\Delta^2, \frac{1}{n\_3}\Delta^2),\tag{12.5.14}$$

as *n*<sup>1</sup> → ∞ and *n*<sup>2</sup> → ∞.

**Theorem 12.5.3.** *Consider two populations <sup>π</sup>*<sup>1</sup> : *Np(μ(*1*) ,Σ) and <sup>π</sup>*<sup>2</sup> : *Np(μ(*2*) , Σ), Σ>O, and simple random samples of respective sizes n*<sup>1</sup> *and n*<sup>2</sup> *from these two populations. Suppose that a simple random sample of size n*<sup>3</sup> *is available, either from π*<sup>1</sup> *or π*2*. For classifying the third sample into π*<sup>1</sup> *or π*2*, the criterion to be utilized is w as given in (12.5.11). Then, the asymptotic distribution of w, when ni* → ∞*, i* = 1*,* 2*, is that of w*<sup>1</sup> *specified in (12.5.13) and the regions of classification are as given in (12.5.12).*

In a practical situation, when the sample sizes *n*<sup>1</sup> and *n*<sup>2</sup> are large, one may replace *Δ*<sup>2</sup> by its sample analogue, and then use (12.5.14) to reach a decision. As it turns out, it proves quite difficult to derive the exact density of *w*.

**Example 12.5.1.** A certain milk collection and distribution center collects and sells the milk supplied by local farmers to the community, the balance, if any, being dispatched to a nearby city. In that locality, there are two types of cows. Some farmers only keep Jersey cows and others, only Holstein cows. Samples of the same quantities of milk are taken and the following characteristics are evaluated: *x*1, the fat content, *x*2, the glucose content, and *x*3, the protein content. It is known that *X*- = *(x*1*, x*2*, x*3*)* is normally distributed as *<sup>X</sup>* <sup>∼</sup> *<sup>N</sup>*3*(μ(*1*) , Σ), Σ > O*, for Jersey cows, and *<sup>X</sup>* <sup>∼</sup> *<sup>N</sup>*3*(μ(*2*) , Σ), Σ > O,* for Holstein cows, with *μ(*1*)* <sup>=</sup> *μ(*2*)* , the covariance matrices *Σ* being assumed identical. These parameters which are not known, are estimated on the basis of 100 milk samples from Jersey cows and 102 samples from Holstein cows, all the samples being of equal volume. The following are the summarized data with our standard notations, where *S*<sup>1</sup> and *S*<sup>2</sup> are the sample sums of products matrices:

$$\bar{X}^{(1)} = \begin{bmatrix} 2 \\ 1 \\ 2 \end{bmatrix}, \ \bar{X}^{(2)} = \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix}, \ S\_1 = \begin{bmatrix} 50 & -50 & 50 \\ -50 & 100 & 0 \\ 50 & 0 & 150 \end{bmatrix}, \ S\_2 = \begin{bmatrix} 150 & -150 & 150 \\ -150 & 300 & 0 \\ 150 & 0 & 450 \end{bmatrix}.$$

Three farmers just brought in their supply of milk and (1): a sample denoted by *X*<sup>1</sup> is collected from the first farmer's supply and evaluated; (2) a sample, *X*2, is taken from a second farmer's supply and evaluated; (3) a set of 5 random samples are collected from a third farmer's supply, the sample average being *X*¯ . The data is

$$X\_1 = \begin{bmatrix} 2 \\ 1 \\ 1 \end{bmatrix}, \ X\_2 = \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} \text{ and } \bar{X} = \begin{bmatrix} 2 \\ 2 \\ 1 \end{bmatrix}, \ n = \mathfrak{F}.$$

Classify, *X*1*, X*<sup>2</sup> and the sample of size 5 to either coming from Jersey or Holstein cows.

**Solution 12.5.1.** The following preliminary calculations are needed:

$$\frac{S}{n\_1 + n\_2 - 2} = \frac{S\_1 + S\_2}{n\_1 + n\_2 - 2} = \frac{S\_1 + S\_2}{200} = \begin{bmatrix} 1 & -1 & 1 \\ -1 & 2 & 0 \\ 1 & 0 & 3 \end{bmatrix},$$

$$\left(\frac{S}{200}\right)^{-1} = \begin{bmatrix} 6 & 3 & -2 \\ 3 & 2 & -1 \\ -2 & -1 & 1 \end{bmatrix}, \left| \bar{X}^{(1)} - \bar{X}^{(2)} = \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} \right., \left| \bar{X}^{(1)} + \bar{X}^{(2)} = \begin{bmatrix} 3 \\ 3 \\ 4 \end{bmatrix} \right.$$

Then,

$$(\bar{X}^{(1)} - \bar{X}^{(2)})' \left(\frac{S}{200}\right)^{-1} = [1, -1, 0] \begin{bmatrix} 6 & 3 & -2 \\ 3 & 2 & -1 \\ -2 & -1 & 1 \end{bmatrix} = [3, 1, -1],$$

$$\frac{1}{2} (\bar{X}^{(1)} - \bar{X}^{(2)})' \left(\frac{S}{200}\right)^{-1} (\bar{X}^{(1)} + \bar{X}^{(2)}) = \frac{1}{2} [3, 1, -1] \begin{bmatrix} 3 \\ 3 \\ 4 \end{bmatrix} = 4,$$

$$(\bar{X}^{(1)} - \bar{X}^{(2)})' \left(\frac{S}{200}\right)^{-1} X = [3, 1, -1]X = 3x\_1 + x\_2 - x\_3 \Rightarrow w = 3x\_1 + x\_2 - x\_3 - 4$$

where the *w* is given in (12.5.11). For answering (1), we substitute *X*<sup>1</sup> to *X* in *w*. That is, *w* at *X*<sup>1</sup> is 3*(*2*)* + *(*1*)* − *(*1*)* − 4 = 2 *>* 0. Hence, we assign *X*<sup>1</sup> to Jersey cows. For answering (2), we replace *X* in *w* by *X*2, that is, 3*(*1*)* + *(*1*)* − *(*2*)* − 4 = −2 *<* 0. Thus, we assign *X*<sup>2</sup> to Holstein cows. For answering (3), we replace *X* in *w* by *X*¯ . That is, 3*(*2*)* + *(*2*)* − *(*1*)* − 4 = 3 *>* 0. Accordingly, we classify this sample as coming from Jersey cows.

#### **12.6. Maximum Likelihood Method of Classification**

As before, let *π*<sup>1</sup> be the *p*-variate real normal population *Np(μ(*1*) , Σ), Σ > O,* with the simple random sample *X(*1*)* <sup>1</sup> *,...,X(*1*) <sup>n</sup>*<sup>1</sup> of size *<sup>n</sup>*<sup>1</sup> drawn from that population, and *<sup>π</sup>*<sup>2</sup> : *Np(μ(*2*) , Σ), Σ > O,* with the simple random sample *X(*2*)* <sup>1</sup> *,...,X(*2*) <sup>n</sup>*<sup>2</sup> of size *<sup>n</sup>*<sup>2</sup> so distributed. A *p*-vector *X* at hand is to be classified into *π*<sup>1</sup> or *π*2. Let the sample means and the sample sums of products matrices be *X*¯ *(*1*) , X*¯ *(*2*) , S*<sup>1</sup> and *S*2. Then, the problem of classification of *X* into *π*<sup>1</sup> or *π*<sup>2</sup> can be stated in terms of testing a hypothesis of the following type: *X* and *X(*1*)* <sup>1</sup> *,...,X(*1*) <sup>n</sup>*<sup>1</sup> are from *Np(μ(*1*) ,Σ)* and *X(*2*)* <sup>1</sup> *,...,X(*2*) <sup>n</sup>*<sup>2</sup> are from *π*<sup>2</sup> constitutes the null hypothesis, versus, the alternative *X* and *X(*2*)* <sup>1</sup> *,...,X(*2*) <sup>n</sup>*<sup>2</sup> are from *Np(μ(*2*) ,Σ)* and *X(*1*)* <sup>1</sup> *,...,X(*1*) <sup>n</sup>*<sup>1</sup> are from *Np(μ(*1*) ,Σ)*. Let the likelihood functions under the null and alternative hypotheses be denoted as *L*<sup>0</sup> and *L*1, respectively, where

$$\begin{split} L\_{0} &= \left\{ \prod\_{j=1}^{n\_{1}} \frac{\mathbf{e}^{-\frac{1}{2}(X\_{j}-\mu^{(1)})'\Sigma^{-1}(X\_{j}-\mu^{(1)})}}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} \right\} \frac{\mathbf{e}^{-\frac{1}{2}(X-\mu^{(1)})'\Sigma^{-1}(X-\mu^{(1)})}}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} \\ &\times \left\{ \prod\_{j=1}^{n\_{2}} \frac{\mathbf{e}^{-\frac{1}{2}(X\_{j}-\mu^{(2)})'\Sigma^{-1}(X\_{j}-\mu^{(2)})}}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} \right\}, \end{split}$$

$$L\_0 = \frac{e^{-\frac{1}{2}\rho\_1}}{(2\pi)^{\frac{(n\_1+n\_2+1)p}{2}}|\Sigma|^{\frac{n\_1+n\_2+1}{2}}}, \ \rho\_1 = \nu + (X - \mu^{(\mathcal{l})})'\Sigma^{-1}(X - \mu^{(\mathcal{l})}), \tag{i}$$

$$L\_1 = \frac{\mathbf{e}^{-\frac{1}{2}\rho\_2}}{(2\pi)^{\frac{(n\_1+n\_2+1)p}{2}}|\boldsymbol{\Sigma}|^{\frac{n\_1+n\_2+1}{2}}}, \ \rho\_2 = \boldsymbol{\nu} + (\boldsymbol{X} - \boldsymbol{\mu}^{(2)})\boldsymbol{\Sigma}^{-1}(\boldsymbol{X} - \boldsymbol{\mu}^{(2)}), \tag{ii}$$

where

$$\begin{split} \nu &= \text{tr}(\boldsymbol{\Sigma}^{-1}\boldsymbol{S}\_{1}) + \frac{n\_{1}}{2} (\bar{\boldsymbol{X}}^{(1)} - \mu^{(1)})' \boldsymbol{\Sigma}^{-1} (\bar{\boldsymbol{X}}^{(1)} - \mu^{(1)}) \\ &+ \text{tr}(\boldsymbol{\Sigma}^{-1}\boldsymbol{S}\_{2}) + \frac{n\_{2}}{2} (\bar{\boldsymbol{X}}^{(2)} - \mu^{(2)})' \boldsymbol{\Sigma}^{-1} (\bar{\boldsymbol{X}}^{(2)} - \mu^{(2)}) \end{split} \tag{iii}$$

and *S*<sup>1</sup> and *S*<sup>2</sup> are the sample sums of products matrices from the samples *X(*1*)* <sup>1</sup> *,...,X(*1*) <sup>n</sup>*<sup>1</sup> and *X(*2*)* <sup>1</sup> *,...,X(*2*) <sup>n</sup>*<sup>2</sup> , respectively. Referring to Chaps. <sup>1</sup> and <sup>3</sup> for vector/matrix derivatives and the maximum likelihood estimators (MLE's) of the parameters of normal populations, the MLE's obtained from *(i)* are the following, denoting the estimators/estimates with a hat: The MLE's under *L*<sup>0</sup> are the following:

$$\begin{split} \hat{\mu}^{(1)} &= \frac{n\_1 \bar{X}^{(1)} + X}{n\_1 + 1}, \; \hat{\mu}^{(2)} = \bar{X}^{(2)}, \; \hat{\Sigma} = \frac{S\_1 + S\_2 + S\_3^{(1)}}{n\_1 + n\_2 + 1} \equiv \hat{\Sigma}\_1, \\\ S\_3^{(1)} &= (X - \hat{\mu}^{(1)})(X - \hat{\mu}^{(1)})' = \left(X - \frac{n\_1 \bar{X}^{(1)} + X}{n\_1 + 1}\right) \left(X - \frac{n\_1 \bar{X}^{(1)} + X}{n\_1 + 1}\right)' \\\ &= \left(\frac{n\_1}{n\_1 + 1}\right)^2 (X - \bar{X}^{(1)}) (X - \bar{X}^{(1)})', \end{split} \tag{12.6.1}$$

observing that the scalar quantity

$$(X - \hat{\mu}^{(\mathbf{l})})' \Sigma^{-1} (X - \hat{\mu}^{(\mathbf{l})}) = \text{tr}(X - \hat{\mu}^{(\mathbf{l})})' \Sigma^{-1} (X - \hat{\mu}^{(\mathbf{l})}) = \text{tr}(\Sigma^{-1} \mathbb{S}\_3^{(\mathbf{l})}) .$$

By substituting the MLE's in *L*0, we obtain the maximum of *L*0:

$$\begin{split} \max L\_{0} &= \frac{\mathbf{c}^{-\frac{(n\_{1}+n\_{2}+1)p}{2}}}{(2\pi)^{\frac{(n\_{1}+n\_{2}+1)p}{2}}|\hat{\Sigma}\_{1}|^{\frac{(n\_{1}+n\_{2}+1)}{2}}},\\ \hat{\Sigma}\_{1} &= \frac{S\_{1}+S\_{2}+(\frac{n\_{1}}{n\_{1}+1})^{2}(X-\bar{X}^{(1)})(X-\bar{X}^{(1)})'}{n\_{1}+n\_{2}+1}.\end{split} \tag{12.6.2}$$

Under *L*1, the MLE's are

$$
\begin{split}
\hat{\mu}^{(2)} &= \frac{n\_2 \bar{X}^{(2)} + X}{n\_2 + 1}, \; \hat{\mu}^{(1)} = \bar{X}^{(1)}, \; \hat{\Sigma} = \frac{S\_1 + S\_2 + S\_3^{(2)}}{n\_1 + n\_2 + 1} \equiv \hat{\Sigma}, \\
\mathcal{S}\_3^{(2)} &= \left(\frac{n\_2}{n\_2 + 1}\right)^2 (X - \bar{X}^{(2)}) (X - \bar{X}^{(2)})'.
\end{split} \tag{12.6.3}
$$

Thus,

$$\begin{aligned} \max L\_1 &= \frac{\mathbf{e}^{-\frac{(n\_1+n\_2+1)p}{2}}}{(2\pi)^{\frac{(n\_1+n\_2+1)p}{2}} |\hat{\Sigma}\_2|^{\frac{n\_1+n\_2+1}{2}}},\\ \hat{\Sigma}\_2 &= \frac{1}{n\_1+n\_2+1} \Big[ S\_1 + S\_2 + \left(\frac{n\_2}{n\_2+1}\right)^2 (X - \bar{X}^{(2)})(X - \bar{X}^{(2)})' \Big]. \end{aligned}$$

Hence,

$$\lambda\_1 = \frac{\max L\_0}{\max L\_1} = \left(\frac{|\hat{\Sigma}\_2|}{|\hat{\Sigma}\_1|}\right)^{\frac{n\_1 + n\_2 + 1}{2}} \Rightarrow \quad \lambda\_1^{\frac{2}{n\_1 + n\_2 + 1}} = z\_1 = \frac{|\hat{\Sigma}\_2|}{|\hat{\Sigma}\_1|}, \text{ so that}$$

$$z\_1 = \frac{|S\_1 + S\_2 + (\frac{n\_2}{n\_2 + 1})^2 (X - \bar{X}^{(2)}) (X - \bar{X}^{(2)})'|}{|S\_1 + S\_2 + (\frac{n\_1}{n\_1 + 1})^2 (X - \bar{X}^{(1)}) (X - \bar{X}^{(1)})'|}. \tag{12.6.4}$$

If *z*<sup>1</sup> ≥ 1, then max*L*<sup>0</sup> ≥ max*L*1, which means that the likelihood of *X* coming from *π*<sup>1</sup> is greater than or equal to the likelihood of *X* originating from *π*2. Hence, we may classify *X* into *π*<sup>1</sup> if *z*<sup>1</sup> ≥ 1 and classify *X* into *π*<sup>2</sup> if *z*<sup>1</sup> *<* 1. In other words,

$$A\_{\mathcal{I}} : z\_{\mathcal{I}} \ge 1 \text{ and } A\_2 : z\_{\mathcal{I}} < 1. \tag{i\nu}$$

If we let *S* = *S*<sup>1</sup> + *S*2*,* then *z*<sup>1</sup> ≥ 1 ⇒

$$|S + \left(\frac{n\_2}{n\_2 + 1}\right)^2 (X - \bar{X}^{(2)}) (X - \bar{X}^{(2)})'|$$

$$\geq |S + \left(\frac{n\_1}{n\_1 + 1}\right)^2 (X - \bar{X}^{(1)}) (X - \bar{X}^{(1)})'|.\tag{v}$$

We can re-express this last inequality in a more convenient form. Expanding the following partitioned determinant in two different ways, we have the following, where *S* is *p* × *p* and *Y* is *p* × 1:

$$
\begin{vmatrix} S & -Y \\ Y' & 1 \end{vmatrix} = |S + YY'| = |S| \begin{vmatrix} 1 + Y'S^{-1}Y \end{vmatrix}
$$

$$
= |S| [1 + Y'S^{-1}Y], \tag{\text{vi}}
$$

observing that 1 + *Y* - *<sup>S</sup>*−1*<sup>Y</sup>* is a scalar quantity. Accordingly, *<sup>z</sup>*<sup>1</sup> <sup>≥</sup> 1 means that

$$1 + \left(\frac{n\_2}{n\_2 + 1}\right)^2 (X - \bar{X}^{(2)})' S^{-1} (X - \bar{X}^{(2)}) \ge 1 + \left(\frac{n\_1}{n\_1 + 1}\right)^2 (X - \bar{X}^{(1)})' S^{-1} (X - \bar{X}^{(1)}).$$

That is,

$$\begin{split} z\_{2} &= \left(\frac{n\_{2}}{n\_{2}+1}\right)^{2} (X - \bar{X}^{(2)})' S^{-1} (X - \bar{X}^{(2)}) \\ &- \left(\frac{n\_{1}}{n\_{1}+1}\right)^{2} (X - \bar{X}^{(1)}) S^{-1} (X - \bar{X}^{(1)}) \geq 0 \quad \Rightarrow \\ z\_{3} &= \left(\frac{n\_{2}}{n\_{2}+1}\right)^{2} (X - \bar{X}^{(2)})' \left(\frac{S}{n\_{1} + n\_{2} - 2}\right)^{-1} (X - \bar{X}^{(2)}) \\ &- \left(\frac{n\_{1}}{n\_{1}+1}\right)^{2} (X - \bar{X}^{(1)})' \left(\frac{S}{n\_{1} + n\_{2} - 2}\right)^{-1} (X - \bar{X}^{(1)}) \geq 0. \end{split}$$
  $\text{where } n\_{1} \text{ is a scalar field in } \bar{X}^{(1)}$ 

Hence, the regions of classification are the following:

$$A\_1: z\_3 \ge 0 \text{ and } A\_2: z\_3 < 0. \tag{\text{vii}}$$

Thus, classify *X* into *π*<sup>1</sup> when *z*<sup>3</sup> ≥ 0 and, *X* into *π*<sup>2</sup> when *z*<sup>3</sup> *<* 0. For large *n*<sup>1</sup> and *n*2, some interesting results ensue. When *<sup>n</sup>*<sup>1</sup> → ∞ and *<sup>n</sup>*<sup>2</sup> → ∞, we have *ni ni*+<sup>1</sup> <sup>→</sup> <sup>1</sup>*, i* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*, <sup>X</sup>*¯ *(i)* <sup>→</sup> *μ(i), i* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup>*,* and *<sup>S</sup> <sup>n</sup>*1+*n*2−<sup>2</sup> <sup>→</sup> *<sup>Σ</sup>*. Then, *<sup>z</sup>*<sup>3</sup> converges to *<sup>z</sup>*<sup>4</sup> where

$$\begin{split} z\_4 &= \frac{1}{2} (X - \mu^{(2)})' \Sigma^{-1} (X - \mu^{(2)}) - (X - \mu^{(1)})' \Sigma^{-1} (X - \mu^{(1)}) \ge 0 \qquad \text{(viii)}\\ &\Rightarrow [X - \frac{1}{2} (\mu^{(1)} + \mu^{(2)})]' \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) \ge 0 \Rightarrow u \ge 0 \end{split}$$

where *u* is the same criterion *u* as that specified in (12.5.7). Hence, we have the following result:

**Theorem 12.6.1.** *Let X(*1*)* <sup>1</sup> *,...,X(*1*) <sup>n</sup>*<sup>1</sup> *be a simple random sample of size <sup>n</sup>*<sup>1</sup> *from <sup>π</sup>*<sup>1</sup> : *Np(μ(*1*) , Σ), Σ > O and X(*2*)* <sup>1</sup> *,...,X(*2*) <sup>n</sup>*<sup>2</sup> *be a simple random sample of size <sup>n</sup>*<sup>2</sup> *from the population <sup>π</sup>*<sup>2</sup> : *Np(μ(*2*) , Σ), Σ > O. Letting X be a vector at hand to be classified into π*<sup>1</sup> *or π*2*, when n*<sup>1</sup> → ∞ *and n*<sup>2</sup> → ∞*, the likelihood ratio criterion for classification is the following: Classify X into π*<sup>1</sup> *if u* ≥ 0 *and, X into π*<sup>2</sup> *if u <* 0 *or equivalently, <sup>A</sup>*<sup>1</sup> : *<sup>u</sup>* <sup>≥</sup> <sup>0</sup> *and <sup>A</sup>*<sup>2</sup> : *u <* <sup>0</sup> *where <sup>u</sup>* = [*<sup>X</sup>* <sup>−</sup> <sup>1</sup> <sup>2</sup> *(μ(*1*)* <sup>+</sup> *μ(*2*) )*] - *<sup>Σ</sup>*−<sup>1</sup>*(μ(*1*)* <sup>−</sup> *μ(*2*) ) whose density is <sup>u</sup>* <sup>∼</sup> *<sup>N</sup>*1*(* <sup>1</sup> <sup>2</sup>*Δ*2*, Δ*2*) when <sup>X</sup> is assigned to <sup>π</sup>*<sup>1</sup> *and <sup>u</sup>* <sup>∼</sup> *<sup>N</sup>*1*(*−<sup>1</sup> <sup>2</sup>*Δ*2*, Δ*2*) when <sup>X</sup> is assigned to <sup>π</sup>*2*, with <sup>Δ</sup>*<sup>2</sup> <sup>=</sup> *(μ(*1*)* <sup>−</sup> *μ(*2*) )*- *<sup>Σ</sup>*−<sup>1</sup>*(μ(*1*)* <sup>−</sup> *μ(*2*) ) denoting Mahalanobis' distance.*

The likelihood ratio criterion for classification specified in (12.6.5) can also be given the following interpretation: For large values of *n*<sup>1</sup> and *n*2*,* the criterion reduces to the following: *(X* <sup>−</sup> *μ(*2*) )*- *<sup>Σ</sup>*−1*(X* <sup>−</sup> *μ(*2*) )* <sup>−</sup> *(X* <sup>−</sup> *μ(*1*) )*- *<sup>Σ</sup>*−1*(X* <sup>−</sup> *μ(*1*) )* ≥ 0 where *(X* <sup>−</sup> *μ(*2*) )*- *<sup>Σ</sup>*−1*(X* <sup>−</sup> *μ(*2*) )* is the generalized distance between *X* and *μ(*2*)* , and *(X* − *μ(*1*) )*- *<sup>Σ</sup>*−1*(X* <sup>−</sup> *μ(*1*) )* is the generalized distance between *X* and *μ(*1*)* , which means that the generalized distance between *X* and *μ(*2*)* is larger than the generalized distance between *X* and *μ(*1*)* when *u >* 0. That is, *X* is closer to *μ(*1*)* than *μ(*2*)* and accordingly, we classify *X* into *π*1, which is the case *u >* 0. Similarly, if *X* is closer to *μ(*2*)* when compared to the distance to *μ(*1*)* , we assign *X* to *π*2, which is the case *u <* 0. The case *u* = 0 is also included in the first inequality, but only for convenience. However, when *P r*{*u* = 0|*πi, i* = 1*,* 2} = 0, replacing *u >* 0 by *u* ≥ 0 is fully justified.

**Note 12.6.1.** The reader may refer to Example 12.3.3 for an illustration of the computations involved in connection with the probabilities of misclassification. For large values of *n*<sup>1</sup> and *n*2, one has the *z*<sup>4</sup> of *(viii)* as an approximation to the *u* appearing in the same equation as well as the *u* of (12.5.7) or that of Example 12.3.3. In order to apply Theorem 12.6.1, one needs to know the parameters *μ(*1*) , μ(*2*)* and *Σ*. When they are not available, one may substitute to them the corresponding estimates *X*¯ *(*1*) , <sup>X</sup>*¯ *(*2*)* and *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>S</sup>*1+*S*<sup>2</sup> *n*1+*n*2−2 when *n*<sup>1</sup> and *n*<sup>2</sup> are large. Then, the approximate probabilities of misclassification can be determined.

**Example 12.6.1.** Redo the problem considered in Example 12.5.1 by making use of the maximum likelihood procedure.

**Solution 12.6.1.** In order to answer the questions, we need to compute

$$\begin{split} z\_4 &= \left(\frac{n\_2}{n\_2+1}\right)^2 (X - \bar{X}^{(2)})' \left(\frac{S}{n\_1+n\_2-1}\right)^{-1} (X - \bar{X}^{(2)})\\ &- \left(\frac{n\_1}{n\_1+1}\right)^2 (X - \bar{X}^{(1)})' \left(\frac{S}{n\_1+n\_2-2}\right)^2 (X - \bar{X}^{(1)}).\end{split}$$

#### Classification Problems

In this case, *<sup>n</sup>*<sup>1</sup> *<sup>n</sup>*1+<sup>1</sup> <sup>=</sup> <sup>100</sup> <sup>101</sup> <sup>≈</sup> 1 and *<sup>n</sup>*<sup>2</sup> *<sup>n</sup>*2+<sup>1</sup> <sup>=</sup> <sup>102</sup> <sup>103</sup> ≈ 1 and hence, the criterion *z*<sup>4</sup> is the same as *w* of (12.5.4) and the decisions arrived at an Example 12.5.1 will remain unchanged in this example. Since *n*<sup>1</sup> and *n*<sup>2</sup> are large, we have reasonably accurate approximations of the parameters as

$$
\bar{X}^{(1)} \to \mu^{(1)}, \ \bar{X}^{(2)} \to \mu^{(2)} \text{ and } \frac{S}{n\_1 + n\_2 - 2} \to \Sigma,
$$

so that the probabilities of misclassification can be evaluated by using their estimates. The approximate distributions are then given by

$$w|\pi\_1 \sim N\_1(\frac{1}{2}\hat{\Delta}^2, \hat{\Delta}^2) \text{ and } w|\pi\_2 \sim N\_1(-\frac{1}{2}\hat{\Delta}^2, \hat{\Delta}^2)$$

where *<sup>Δ</sup>*ˆ<sup>2</sup> <sup>=</sup> *(X*¯ *(*1*)* <sup>−</sup> *<sup>X</sup>*¯ *(*2*) )*- *( <sup>S</sup> <sup>n</sup>*1+*n*2−<sup>2</sup> *)*−1*(X*¯ *(*1*)* <sup>−</sup> *<sup>X</sup>*¯ *(*2*) )*. From the computations done in Example 12.5.1, we have

$$(\bar{X}^{(1)} - \bar{X}^{(2)})' = [1, -1, 0], \ (\bar{X}^{(1)} - \bar{X}^{(2)})' \left(\frac{\mathcal{S}}{n\_1 + n\_2 - 2}\right)^{-1} = [3, 1, -1],$$

$$\begin{aligned} \hat{\Delta}^2 &= [3, 1, -1] \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} = 2 \\ \Rightarrow w|\pi\_1 &\sim N\_1(1, 2) \quad \text{and} \ w|\pi\_2 &\sim N\_1(-1, 2), \text{ approximately.} \end{aligned}$$

As well, *A*<sup>1</sup> : *w* ≥ 0 and *A*<sup>2</sup> : *w <* 0. For the data pertaining to (1) of Example 12.5.1, we have *w >* 0 and *X*<sup>1</sup> is assigned to *π*1. Observing that *w* → *u* of (12.5.7),

$$\begin{split} P(1|1,A) &= \text{Probability of arriving at a correct decision} \\ &= Pr\{u > 0 | \pi\_1\} = \int\_0^\infty \frac{1}{\sqrt{2}\sqrt{(2\pi)}} \text{e}^{-\frac{1}{4}(u-1)^2} \text{d}u \\ &= \int\_{\frac{0-1}{\sqrt{2}}}^\infty \frac{1}{\sqrt{(2\pi)}} \text{e}^{-\frac{1}{2}v^2} \text{d}v \approx 0.76; \end{split}$$

*P (*1|2*, A)* = Probability of misclassification

$$\begin{split} &= \Pr\{\mu > 0 | \pi\_2\} = \int\_0^\infty \frac{1}{\sqrt{2}\sqrt{(2\pi)}} \mathrm{e}^{-\frac{1}{4}(\mu+1)^2} \mathrm{d}\mu \\ &= \int\_{\frac{1}{\sqrt{2}}}^\infty \frac{1}{\sqrt{(2\pi)}} \mathrm{e}^{-\frac{1}{2}v^2} \mathrm{d}v \approx 0.24. \end{split}$$

In Example 12.5.1, the observed vector provided for (2) is classified into *π*<sup>2</sup> since *w <* 0. Thus, the probability of making the right decision is *P (*2|2*, A)* = *P r*{*u <* 0|*π*2} ≈ 0*.*76 and the probability of misclassification is *P (*2|1*, A)* = *P r*{*u <* 0|*π*1} ≈ 0*.*24*.* Given the data related to (3) of Example 12.5.1, the only difference is that the distributions in *π*<sup>1</sup> and *π*<sup>2</sup> will be slightly different, the mean values remaining the same but the variance *Δ*ˆ<sup>2</sup> being replaced by *<sup>Δ</sup>*ˆ2*/n* where *<sup>n</sup>* <sup>=</sup> 5. The computations are similar to those provided for (1), the sample mean being assigned to *π*<sup>1</sup> in this case.

#### **12.7. Classification Involving** *k* **Populations**

Consider the *p*-variate populations *π*1*,...,πk* and let *X* be a *p*-vector at hand to be classified into one of these *k* populations. Let *q*1*,...,qk* be the prior probabilities of selecting these populations, *qj >* 0*, j* = 1*,... , k,* with *q*<sup>1</sup> +···+ *qk* = 1. Let the cost of misclassification of a *p*-vector belonging to *πi* being improperly classified into *πj* be *C(j* |*i)* for *i* = *j* so that *C(i*|*i)* = 0*, i* = 1*,...,k*. A decision rule *A* = *(A*1*,...,Ak)* determines subspaces *Aj* ⊂ *Rp, j* = 1*,... , k,* with *Ai* ∩*Aj* = *φ* (the empty set) for all *i* = *j* . Let the probability/density functions associated with the *k* populations be *Pj (X), j* = 1*,...,k*, respectively. Let *P (j* |*i, A)* = *P r*{*X* ∈ *Aj* |*πi* : *Pi(X), A*} = probability of an observation coming from or belonging to the population *πi* or originating from the probability/density function *Pi(X)*, being improperly assigned to *πj* or misclassified as coming from *Pj (X)*, and the cost associated with this misclassification be denoted by *C(j* |*i)*. Under the rule *A* = *(A*1*,...,Ak)*, the probabilities of correctly classifying and misclassifying an observed vector are the following, assuming that the *Pj (X)*- s*, j* = 1*,... , k,* are densities:

$$P(i|i,A) = \int\_{A\_i} P\_i(X) \text{dX} \text{ and } \ P(j|i,A) = \int\_{A\_j} P\_i(X) \text{dX}, \text{ i.e., } j = 1, \dots, k,\tag{i}$$

where *P (i*|*i, A)* is a probability of achieving a correct classification, that is, of assigning an observation *X* to *πi* when the population is actually *πi*, and *P (j* |*i, A)* is the probability of an observation *X* coming from *πi* being misclassified as originating from *πj* . Consider a *p*-vector *X* at hand. What is then the probability that this *X* came from *Pi(X)*, given that *X* is an observation vector from one of the populations *π*1*,...,πk*? This is in fact a conditional statement involving

$$\frac{q\_i P\_i(X)}{q\_1 P\_1(X) + q\_2 P\_2(X) + \dots + q\_k P\_k(X)} \cdot \frac{1}{\cdot}$$

Suppose that for specific *i* and *j* , the conditional probability

$$\frac{q\_i P\_i(X)}{q\_1 P\_1(X) + \dots + q\_k P\_k(X)} \ge \frac{q\_j P\_j(X)}{q\_1 P\_1(X) + \dots + q\_k P\_k(X)}.\tag{ii}$$

This is tantamount to presuming that the likeliness of *X* originating from *Pi(X)* is greater than or equal to that of *X* coming from *Pj (X)*. In this case, we would like to assign *X* to *πi* rather than *πj* . If *(ii)* holds for all *j* = 1*,... , k, j* = *i,* then we classify *X* into *πi*. Equation *(ii)* for *j* = 1*,... , k, j* = *i,* implies that

$$q\_i P\_i(X) \ge q\_j P\_j(X) \Rightarrow \frac{P\_i(X)}{P\_j(X)} \ge \frac{q\_j}{q\_i}, \ j = 1, \ldots, k, \ j \ne i. \tag{12.7.1}$$

Accordingly, we adopt (12.7.1) as a decision rule *A* = *(A*1*,...,Ak)*. This decision rule corresponds to the following: When *X* ∈ *A*<sup>1</sup> ⊂ *Rp* or *X* falls in *A*1, then *X* is classified into *π*1, when *X* ∈ *A*2, then *X* is assigned to *π*2, and so on. What is the expected cost of an *X* belonging to *πi* being misclassified into *πj* under some decision rule *B* = *(B*1*,...,Bk), Bj* ⊂ *Rp, j* = 1*,... , k, Bi* ∩*Bj* = *O, i* = *j,* for all *i* and *j*? This is *qiPi(X)C(j* |*i)* ≡ *Ei(B)*. The expected cost of an *X* belonging to *πj* being misclassified into *πi* under the same decision rule *B* is *Ej (B)* = *qjPj (X)C(i*|*j )*. If *Ei(B) < Ej (B)*, then we favor *Pi(X)* over *Pj (X)* as it is always desirable to minimize the expected cost in any procedure or decision. If *Ei(B) < Ej (B)* for all *j* = 1*,... , k, j* = *i,* then *Pi(X)* or *πi* is preferred over all other populations to which *X* could be assigned. Note that

$$E\_l(B) < E\_j(B) \Rightarrow q\_l P\_l(X)C(j|i) < q\_j P\_j(X)C(i|j) \Rightarrow \frac{P\_l(X)}{P\_j(X)} < \frac{q\_j C(i|j)}{q\_l C(j|i)},\quad(\text{iii})$$

for *j* = 1*,... , k, j* = *i*, so that *(iii)* is the situation resulting from the following misclassification rule: if

$$\frac{P\_i(X)}{P\_j(X)} \ge \frac{q\_j \ C(i|j)}{q\_i \ C(j|i)}, \ j = 1, \dots, k, \ j \ne i,\tag{12.7.2}$$

we classify *X* into *πi* or equivalently, *X* ∈ *Ai*, which is the decision rule *A* = *(A*1*,...,Ak)*. Thus, the decision rule *B* in *(iii)* is identical to *A*. Observing that when *C(i*|*j )* = *C(j* |*i)*, (12.7.2) reduces to (12.7.1); the decision rule *A* = *(A*1*,...,Ak)* in (12.7.1) is seen to yield the maximum probability of assigning an observation *X* at hand to *πi* compared to the probability of assigning *X* to any other *πj , j* = 1*,... , k, j* = *i,* when the costs of misclassification are equal. As well, it follows from (12.7.2) that the decision rule *A* = *(A*1*,...,Ak)* gives the minimum expected cost associated with assigning the observation *X* at hand to *πi* compared to assigning *X* to any other population *πj , j* = 1*,...,k ,j* = *i*.

#### **12.7.1. Classification when the populations are real Gaussian**

Let the populations be *<sup>p</sup>*-variate real normal, that is, *πj* <sup>∼</sup> *Np(μ(j ), Σ), Σ > O, j* <sup>=</sup> 1*,... , k,* with different mean value vectors but the same covariance matrix *Σ>O*. Let the density of *πj* be denoted by *Pj (X) Np(μ(j ), Σ), Σ > O.* A vector *<sup>X</sup>* at hand is to be assigned to one of the *πi*'s, *i* = 1*,...,k*. In Sect. 12.3 or Example 12.3.3, the decision rule involves two populations. Letting the two populations be *πi* : *Pi(X)* and *πj* : *Pj (X)* for specific *i* and *j* , it was determined that the decision rule consists of classifying *X* into *πi* if ln *Pi(X) Pj (X)* <sup>≥</sup> ln *ρ, ρ* <sup>=</sup> *qjC(i*|*j ) qiC(j* <sup>|</sup>*i) ,* with *<sup>ρ</sup>* <sup>=</sup> 1 so that ln *<sup>ρ</sup>* <sup>=</sup> 0 whenever *C(i*|*j )* <sup>=</sup> *C(j* <sup>|</sup>*i)* and *qi* = *qj* . When ln *ρ* = 0, we have seen that the decision rule is to classify the *p*-vector *X* into *πi* or *Pi(X)* if *uij (X)* ≥ 0 and to assign *X* to *Pj (X)* or *πj* if *uij (X) <* 0*,* where

$$\begin{split} u\_{ij}(\mathbf{X}) &= (\mu^{(i)} - \mu^{(j)})' \Sigma^{-1} \mathbf{X} - \frac{1}{2} (\mu^{(i)} - \mu^{(j)})' \Sigma^{-1} (\mu^{(i)} + \mu^{(j)}) \\ &= [X - \frac{1}{2} (\mu^{(i)} + \mu^{(j)})]' \Sigma^{-1} (\mu^{(i)} - \mu^{(j)}). \end{split} \tag{i\nu}$$

Now, on applying the result obtained in *(iv)* to (12.7.1) and (12.7.2), one arrives at the following decision rule:

$$A\_i: u\_{lj}(X) \ge 0 \text{ or } A\_i: u\_{lj}(X) \ge \ln k,\ k = \frac{q\_j \, \mathrm{C}(i|j)}{q\_i \, \mathrm{C}(j|i)},\ j = 1, \ldots, k,\ j \ne i,\tag{12.7.3}$$

with ln *ρ* = 0 occurring when *qi* = *qj* and *C(i*|*j )* = *C(j* |*i)*.

**Note 12.7.1.** What will interchanging *i* and *j* in *uij (X)* entail? Note that, as defined, *uij (X)* involves the terms *(μ(i)* <sup>−</sup> *μ(j ))* = −*(μ(j )* <sup>−</sup> *μ(i))* and *(μ(i)* <sup>+</sup> *μ(j )),* the latter being unaffected by the interchange of *μ(i)* and *μ(j )*. Hence, for all *i* and *j* ,

$$
u\_{lj}(X) = -\boldsymbol{u}\_{jl}(X),\ i \neq j.\tag{12.7.4}$$

When the underlying population is *<sup>X</sup>* <sup>∼</sup> *Np(μ(i),Σ)*, *<sup>E</sup>*[*uij (X)*|*πi*] = <sup>1</sup> 2*Δ*<sup>2</sup> *ij* , which implies that *<sup>E</sup>*[*uj i*|*πi*]=−<sup>1</sup> 2*Δ*<sup>2</sup> *ij* = −*E*[*uij (X)*|*πi*] where *<sup>Δ</sup>*<sup>2</sup> *ij* <sup>=</sup> *(μ(i)* <sup>−</sup> *μ(j ))*- *<sup>Σ</sup>*−<sup>1</sup>*(μ(i)* <sup>−</sup> *μ(j ))*.

**Note 12.7.2.** For computing the probabilities of correctly classifying and misclassifying an observed vector, certain assumptions regarding the distributions associated with the populations *πj , j* = 1*,... , k,* are needed, the normality assumption being the most convenient one.

**Example 12.7.1.** A certain milk collection and distribution center collects and sells the milk supplied by local farmers to the community, the balance, if any, being dispatched to a nearby city. In that locality, there are three dairy cattle breeds, namely, Jersey, Holstein and Guernsey, and each farmer only keeps one type of cows. Samples are taken and the following characteristics are evaluated in grams per liter: *x*1, the fat content, *x*2, the glucose content, and *x*3, the protein content. It has been determined that *X*-= *(x*1*, x*2*, x*3*)* is normally distributed as *<sup>X</sup>* <sup>∼</sup> *<sup>N</sup>*3*(μ(*1*) ,Σ)* for Jersey cows, *<sup>X</sup>* <sup>∼</sup> *<sup>N</sup>*3*(μ(*2*) ,Σ)* for Holstein cows and *<sup>X</sup>* <sup>∼</sup> *<sup>N</sup>*3*(μ(*3*) ,Σ)* for Guernsey cows, with a common covariance matrix *Σ > O,* where

$$\boldsymbol{\mu}^{(1)} = \begin{bmatrix} 2 \\ 3 \\ 1 \end{bmatrix}, \ \boldsymbol{\mu}^{(2)} = \begin{bmatrix} 1 \\ 3 \\ 2 \end{bmatrix}, \ \boldsymbol{\mu}^{(3)} = \begin{bmatrix} 2 \\ 3 \\ 3 \end{bmatrix} \text{ and } \ \boldsymbol{\Sigma} = \begin{bmatrix} 3 & 0 & 0 \\ 0 & 2 & -1 \\ 0 & -1 & 1 \end{bmatrix}.$$

(1): A farmer brought in his supply of milk from which one liter was collected. The three variables were evaluated, the result being *X*- <sup>0</sup> = *(*2*,* 3*,* 4*)*. (2): Another one liter sample was taken from a second farmer's supply and it was determined that the vector of the resulting measurements was *X*- <sup>1</sup> = *(*2*,* 2*,* 2*)*. No prior probabilities or costs are involved. Which breed of dairy cattle is each of these farmers likely to own?

**Solution 12.7.1.** Our criterion is based on *uij (X)* where

$$
\mu\_{ij}(X) = (\mu^{(i)} - \mu^{(j)})' \Sigma^{-1} X - \frac{1}{2} (\mu^{(i)} - \mu^{(j)})' \Sigma^{-1} (\mu^{(i)} + \mu^{(j)}).
$$

Let us evaluate the various quantities of interest:

$$\begin{aligned} \mu^{(1)} - \mu^{(2)} &= \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \; \mu^{(1)} - \mu^{(3)} = \begin{bmatrix} 0 \\ 0 \\ -2 \end{bmatrix}, \; \mu^{(2)} - \mu^{(3)} = \begin{bmatrix} -1 \\ 0 \\ -1 \end{bmatrix}, \\\ \mu^{(1)} + \mu^{(2)} &= \begin{bmatrix} 3 \\ 6 \\ 3 \end{bmatrix}, \; \mu^{(1)} + \mu^{(3)} = \begin{bmatrix} 4 \\ 6 \\ 4 \end{bmatrix}, \; \mu^{(2)} + \mu^{(3)} = \begin{bmatrix} 3 \\ 6 \\ 5 \end{bmatrix}; \\\ \Sigma^{-1} &= \begin{bmatrix} \frac{1}{3} & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 1 & 2 \end{bmatrix}; \end{aligned}$$

*A*<sup>1</sup> : {*u*12*(X)* ≥ 0*, u*13*(X)* ≥ 0}*, A*<sup>2</sup> : {*u*21*(X)* ≥ 0*, u*23*(X)* ≥ 0}*, A*<sup>3</sup> : {*u*31*(X)* ≥ 0*, u*32*(X)* ≥ 0};

$$\begin{aligned} \left(\mu^{(1)} - \mu^{(2)}\right)' \Sigma^{-1} X &= (1, 0, -1) \Sigma^{-1} X = (\frac{1}{3}, -1, -2)X = \frac{1}{3} \mathbf{x}\_1 - \mathbf{x}\_2 - 2 \mathbf{x}\_3 \\ \left(\mu^{(1)} - \mu^{(3)}\right)' \Sigma^{-1} X &= (0, 0, -2) \Sigma^{-1} X = (0, -2, -4)X = -2 \mathbf{x}\_2 - 4 \mathbf{x}\_3 \\ \left(\mu^{(2)} - \mu^{(3)}\right)' \Sigma^{-1} X &= (-1, 0, -1) \Sigma^{-1} X = (-\frac{1}{3}, -1, -2)X = -\frac{1}{3} \mathbf{x}\_1 - \mathbf{x}\_2 - 2 \mathbf{x}\_3 \end{aligned}$$

$$\begin{aligned} \frac{1}{2}(\mu^{(1)} - \mu^{(2)})'\Sigma^{-1}(\mu^{(1)} + \mu^{(2)}) &= \frac{1}{2}[\frac{1}{3}, -1, -2] \begin{bmatrix} 3\\6\\3 \end{bmatrix} = -\frac{11}{2} \\\ \frac{1}{2}(\mu^{(1)} - \mu^{(3)})'\Sigma^{-1}(\mu^{(1)} + \mu^{(3)}) &= \frac{1}{2}[0, -2, -4] \begin{bmatrix} 4\\6\\6\\4 \end{bmatrix} = -14 \\\ \frac{1}{2}(\mu^{(2)} - \mu^{(3)})'\Sigma^{-1}(\mu^{(2)} + \mu^{(3)}) &= \frac{1}{2}[-\frac{1}{3}, -1, -2] \begin{bmatrix} 3\\6\\5 \end{bmatrix} = -\frac{17}{2}. \end{aligned}$$

Hence,

$$\begin{aligned} \mu\_{12}(X) &= \frac{1}{3}\mathbf{x}\_1 - \mathbf{x}\_2 - 2\mathbf{x}\_3 + \frac{11}{2}; \ \ u\_{13}(X) = -2\mathbf{x}\_2 - 4\mathbf{x}\_3 + 14; \\\ u\_{21}(X) &= -\frac{1}{3}\mathbf{x}\_1 + \mathbf{x}\_2 + 2\mathbf{x}\_3 - \frac{11}{2}; \ \ u\_{23}(X) = -\frac{1}{3}\mathbf{x}\_1 - \mathbf{x}\_2 - 2\mathbf{x}\_3 + \frac{17}{2}; \\\ u\_{31}(X) &= 2\mathbf{x}\_2 + 4\mathbf{x}\_3 - 14; \ \ u\_{32}(X) = \frac{1}{3}\mathbf{x}\_1 + \mathbf{x}\_2 + 2\mathbf{x}\_3 - \frac{17}{2}. \end{aligned}$$

In order to answer (1), we substitute *X*<sup>0</sup> to *X* and first, evaluate *u*12*(X*0*)* and *u*13*(X*0*)* to determine whether they are <sup>≥</sup> 0. Since *<sup>u</sup>*12*(X*0*)* <sup>=</sup> <sup>1</sup> <sup>3</sup> *(*2*)*−*(*3*)*−2*(*4*)*+<sup>11</sup> <sup>2</sup> *<* 0, the condition is violated and hence we need not check for *u*13*(X*0*)* ≥ 0. Thus, *X*<sup>0</sup> is not in *A*1. Now, consider *<sup>u</sup>*21*(X*0*)* = −<sup>1</sup> <sup>3</sup> *(*2*)*+3+2*(*4*)*−<sup>11</sup> <sup>2</sup> *<sup>&</sup>gt;* 0 and *<sup>u</sup>*23*(X*0*)* = −<sup>1</sup> <sup>3</sup> *(*2*)*−*(*3*)*−2*(*4*)*−<sup>17</sup> <sup>2</sup> *<* 0; again the condition is violated and we deduce that *X*<sup>0</sup> is not in *A*2. Finally, we verify *A*3: *<sup>u</sup>*31*(X*0*)* <sup>=</sup> <sup>2</sup>*(*3*)* <sup>+</sup> <sup>2</sup>*(*4*)* <sup>−</sup> <sup>14</sup> <sup>=</sup> 0 and *<sup>u</sup>*32*(X*0*)* <sup>=</sup> <sup>1</sup> <sup>3</sup> *(*2*)* <sup>+</sup> *(*3*)* <sup>+</sup> <sup>2</sup>*(*4*)* <sup>−</sup> <sup>17</sup> <sup>2</sup> *>* 0. Thus, *X*<sup>0</sup> ∈ *A*3, that is, we conclude that the sample milk came from Guernsey cows.

For answering (2), we substitute *<sup>X</sup>*<sup>1</sup> to *<sup>X</sup>* in *uij (X)*. Noting that *<sup>u</sup>*12*(X*1*)* <sup>=</sup> <sup>1</sup> <sup>3</sup> *(*2*)* − *(*2*)*−2*(*2*)*<sup>+</sup> <sup>11</sup> <sup>2</sup> *>* 0 and *u*13*(X*1*)* = −2*(*2*)*−4*(*2*)*+14 *>* 0, we can surmise that *X*<sup>1</sup> ∈ *A*1, that is, the sample milk came from Jersey cows. Let us verify *A*<sup>2</sup> and *A*<sup>3</sup> to ascertain that no mistake has been made in the calculations. Since *u*21*(X*1*) <* 0, *X*<sup>1</sup> is not in *A*2, and since *u*31*(X*0*) <* 0, *X*<sup>1</sup> is not in *A*3. This completes the computations.

#### **12.7.2. Some distributional aspects**

For computing the probabilities of correctly classifying and misclassifying an observation, we require the distributions of our criterion *uij (X)*. Let the populations be normally distributed, that is, *πj* <sup>∼</sup> *Np(μ(j ), Σ), Σ > O,* with the same covariance matrix *<sup>Σ</sup>* for all *k* populations, *j* = 1*,...,k*. Then, the probability of achieving a correct classification when *X* is assigned to *πi* is the following under the decision rule *A* = *(A*1*,...,Ak)*:

$$P(i|i,A) = \int\_{A\_i} P\_i(X)dX\tag{12.7.5}$$

where d*X* = d*x*<sup>1</sup> ∧ *...* ∧ d*xp* and the integral is actually a multiple integral. But *Ai* is defined by the inequalities *ui*1*(X)* ≥ 0*, ui*2*(X)* ≥ 0*,...,uik(X)* ≥ 0*,* where *uii(X)* is excluded. This is the case when no prior probabilities and costs are involved or when the prior probabilities are equal and the cost functions are identical. Otherwise, the region is {*Ai* : *uij (X)* <sup>≥</sup> ln *kij , kij* <sup>=</sup> *qjC(i*|*j ) qiC(j* <sup>|</sup>*i) , j* <sup>=</sup> <sup>1</sup>*,... , k, j* = *i*}. Integrating (12.7.5) is challenging as the region is determined by *k* − 1 inequalities.

When the parameters *μ(j ), j* <sup>=</sup> <sup>1</sup>*,... , k,* and *<sup>Σ</sup>* are known, we can evaluate the joint distributions of *uij (X), j* = 1*,... , k, j* = *i*, under the normality assumption for *πj , j* = 1*,...,k*. Let us examine the distributions of *uij (X)* for normally distributed *πi* : *Pi(X)*, *<sup>i</sup>* <sup>=</sup> <sup>1</sup>*,... , k.* In this instance, *<sup>E</sup>*[*X*]|*πi* <sup>=</sup> *μ(i),* and under *πi*,

$$\begin{split} E[\mu\_{ij}(X)]|\pi\_{i} &= (\mu^{(i)} - \mu^{(j)})' \Sigma^{-1} \mu^{(i)} - \frac{1}{2} (\mu^{(i)} - \mu^{(j)})' \Sigma^{-1} (\mu^{(i)} + \mu^{(j)}) \\ &= \frac{1}{2} (\mu^{(i)} - \mu^{(j)})' \Sigma^{-1} (\mu^{(i)} - \mu^{(j)}) = \frac{1}{2} \Delta\_{ij}^{2}; \\ \text{Var}(\mu\_{ij}(X))|\pi\_{i} &= \text{Var}[(\mu^{(i)} - \mu^{(j)})' \Sigma^{-1} X] = \Delta\_{ij}^{2}. \end{split}$$

Since *uij (X)* is a linear function of the vector normal variable *X,* it is normal and the distribution of *uij (X)*|*πi* is

$$
\mu\_{ij}(X) \sim N\_1(\frac{1}{2}\Delta\_{ij}^2, \Delta\_{ij}^2), \ j = 1, \ldots, k, \ j \neq i. \tag{12.7.6}
$$

This normality holds for each *j* , *j* = 1*,... , k, j* = *i,* and for a fixed *i*. Then, we can evaluate the joint density of *ui*1*(X), ui*2*(X), . . . , uik(X)*, excluding *uii(X)*, and we can evaluate *P (i*|*i, A)* from this joint density. Observe that for *j* = 1*,... , k, j* = *i,* the *uij (X)*'s are linear functions of the same vector normal variable *X* and hence, they have a joint normal distribution. In that case, the mean value vector is a *(k* − 1*)*-vector, denoted by *μ(ii)*, whose elements are <sup>1</sup> 2*Δ*<sup>2</sup> *ij , j* = 1*,... , k, j* = *i,* for a fixed *i*, or equivalently,

$$\mu'\_{(li)} = [\frac{1}{2}\Delta\_{i1}^2, \dots, \frac{1}{2}\Delta\_{ik}^2] = E[U'\_{li}] \text{ with } U'\_{li} = [\mu\_{i1}(X), \dots, \mu\_{ik}(X)],$$

excluding the elements *uii(X)* and *Δ*<sup>2</sup> *ii* = 0. The subscript *ii* in *Uii* indicates the region *Ai* and the original population *Pi(X)*. The covariance matrix of *Uii*, denoted by *Σii*, will be a *(k* − 1*)* × *(k* − 1*)* matrix of the form *Σii* = [Cov*(uir, uit)*] = *(crt), crt* = Cov*(uir(X), uit(X))*. The subscript *ii* in *Σii* indicates the region *Ai* and the original population *Pi(X)*. Observe that for two linear functions *t*<sup>1</sup> = *C*- *X* = *c*1*x*<sup>1</sup> +···+ *cpxp* and *t*<sup>2</sup> = *B*- *X* = *b*1*x*<sup>1</sup> +···+ *bpxp*, having a common covariance matrix Cov*(X)* = *Σ*, we have Var*(t*1*)* = *C*- *ΣC*, Var*(t*2*)* = *B*- *ΣB* and Cov*(t*1*, t*2*)* = *C*- *ΣB* = *B*- *ΣC*. Therefore,

$$c\_{rt} = (\mu^{(i)} - \mu^{(r)})' \Sigma^{-1} (\mu^{(i)} - \mu^{(t)}), \ i \neq r, t; \ \Sigma\_{li} = (c\_{rt}).$$

Let the vector *Uii* be such that *U*- *ii* = *(ui*1*(X), . . . , uik(X))*, excluding *uii(X)*. Thus, for a specific *i*,

$$U\_{il} \sim N\_{k-1}(\mu\_{(il)}, \Sigma\_{il}), \ \Sigma\_{il} > O,$$

and its density function, denoted by *gii(Uii)*, is

$$g\_{li}(U\_{ii}) = \frac{1}{\left(2\pi\right)^{\frac{k-1}{2}} |\Sigma\_{li}|^{\frac{1}{2}}} \mathbf{e}^{-\frac{1}{2}(U\_{ii} - \mu\_{(ii)})'\Sigma\_{ii}^{-1}(U\_{ii} - \mu\_{(ii)})}.$$

Then,

$$\begin{split}P(i|i,A) &= \int\_{\substack{u\_{ij}(X)\geq 0,\ j=1,\ldots,k,\ j\neq i}} g\_{li}(U\_{ii}) \mathbf{d}U\_{ii} \\ &= \int\_{u\_{i1}(X)=0}^{\infty} \cdots \int\_{u\_{ik}(X)=0}^{\infty} g\_{li}(U\_{ii}) \mathbf{d}u\_{i1}(X) \wedge \ldots \wedge \mathbf{d}u\_{ik}(X),\end{split} \tag{12.7.7}$$

the differential d*uii* being absent from d*Uii*, which is also the case for *uii(X)* ≥ 0 in the integral. If prior probabilities and cost functions are involved, then replace *uij (X)* ≥ 0 in the integral (12.7.7) by *uij (X)* <sup>≥</sup> ln *kij , kij* <sup>=</sup> *qjC(i*|*j ) qiC(j* <sup>|</sup>*i)* . Thus, the problem reduces to determining the joint density *gii(Uii)* and then evaluating the multiple integrals appearing in (12.7.7). In order to compute the probability specified in (12.7.7), we standardize the normal density by letting *Vii* <sup>=</sup> *<sup>Σ</sup>*−<sup>1</sup> 2 *ii Uii* where *Vii* ∼ *Nk*−1*(O, I )*, and with the help of this standard normal, we may compute this probability through *Vii*. Note that (12.7.7) holds for each *i*, *i* = 1*,...,k*, and thus, the probabilities of achieving a correct classification, *P (i*|*i, A)* for *i* = 1*,... , k,* are available from (12.7.7).

For computing probabilities of misclassification of the type *P (i*|*j, A),* we can proceed as follows: In this context, the basic population is *πj* : *Pj (X)* <sup>∼</sup> *Np(*−<sup>1</sup> 2*Δ*<sup>2</sup> *ij , Δ*<sup>2</sup> *ij )*, the region of integration being *Ai* : {*ui*1*(X)* ≥ 0*,...,uik(X)* ≥ 0}, excluding the element *uii(X)* ≥ 0. Consider the vector *Uij* corresponding to the vector *Uii*. In *Uij* , *i* stands for the region *Ai* and *j* , for the original population *Pj (X)*. The elements of *Uij* are the same as those of *Uii*, that is, *U*- *ij* = *(ui*1*(X), . . . , uik(X))*, excluding *uii(X)*. We then proceed as before and compute the covariance matrix *Σij* of *Uij* in the original population *Pj (X)*. The variances of *uim(X), m* = 1*,... , k, m* = *i,* will remain the same but the covariances will be different since they depend on the mean values. Thus, *Uij* ∼ *Nk*−<sup>1</sup>*(μ(ij ), Σij )*, and on standardizing, one has *Vij* ∼ *Nk*−1*(O, I )*, so that the required probability *P (i*|*j, A)* can be computed from the elements of *Vij* . Note that when the prior probabilities and costs are equal,

Classification Problems

$$\begin{split}P(i|j,A) &= \int\_{\substack{\boldsymbol{u}\_{i1}(X)\geq 0,\ldots,\boldsymbol{u}\_{ik}(X)\geq 0\\ &= \int\_{\substack{\boldsymbol{u}\_{i1}(X)=0}}^{\infty} \cdots \int\_{\substack{\boldsymbol{u}\_{ik}(X)\geq 0\\ \boldsymbol{u}\_{ik}(X)=0}}^{\infty} \operatorname{g}\_{ij}(U\_{ij})\,\operatorname{d}U\_{ij},\end{split}\tag{12.7.8}$$

excluding *uii(X)* in the integral as well as the differential d*uii(X)*. Thus, d*Uij* = d*ui*1*(X)*∧ *...* ∧ d*uik(X)*, excluding d*uii(X)*.

**Example 12.7.2.** Given the data provided in Example 12.7.1, what is the probability of correctly assigning *X* to *π*1? That is, compute the probability *P (*1|1*, A)*.

**Solution 12.7.2.** Observe that the joint density of *u*12*(X)* and *u*13*(X)* is that of a bivariate normal distribution since *u*12*(X)* and *u*13*(X)* are linear functions of the same vector *X* where *X* has a multivariate normal distribution. In order to compute the joint bivariate normal density, we need *E*[*u*1*<sup>j</sup> (X)*]*,* Var*(u*1*<sup>j</sup> (X)), j* = 2*,* 3 and Cov*(u*12*(X), u*13*(X))*. The following quantities are evaluated from the data given in Example 12.7.1:

$$\begin{split} \text{Var}(\mu\_{12}(X)) &= \text{Var}[(\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X - \frac{1}{2} (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} + \mu^{(2)})] \\ &= \text{Var}[(\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} X] = (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} - \mu^{(2)}) \\ &= [\frac{1}{3}, -1, -2] \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix} = \frac{7}{3} \Rightarrow E[\mu\_{12}(X)] = \frac{7}{6}; \end{split}$$

$$\begin{aligned} \text{Var}(\mu\_{13}(X)) &= (\mu^{(1)} - \mu^{(3)})' \Sigma^{-1} (\mu^{(1)} - \mu^{(3)}) \\ &= [0, -2, -4] \begin{bmatrix} 0 \\ 0 \\ -2 \end{bmatrix} = 8 \Rightarrow E[\mu\_{13}(X)] = 4; \\\\ \text{Cov}(\mu\_{12}(X), \mu\_{13}(X)) &= (\mu^{(1)} - \mu^{(2)})' \Sigma^{-1} (\mu^{(1)} - \mu^{(3)}) = [\frac{1}{3}, -1, -2] \begin{bmatrix} 0 \\ 0 \\ -2 \end{bmatrix} = 4. \end{aligned}$$

Hence, the covariance matrix of *U*<sup>11</sup> = *u*12*(X) <sup>u</sup>*13*(X)* , denoted by *Σ*11, is the following:

$$\begin{aligned} \Sigma\_{11} &= \begin{bmatrix} \frac{7}{3} & 4\\ 4 & 8 \end{bmatrix} \Rightarrow |\Sigma\_{11}| = \frac{8}{3} \\\Sigma\_{11}^{-1} &= \frac{3}{8} \begin{bmatrix} 8 & -4\\ -4 & \frac{7}{3} \end{bmatrix} = \frac{1}{8} \begin{bmatrix} 24 & -12\\ -12 & 7 \end{bmatrix}, \end{aligned}$$

where

$$
\frac{1}{8} \begin{bmatrix} 24 & -12 \\ -12 & 7 \end{bmatrix} = \frac{1}{8} \begin{bmatrix} 2\sqrt{6} & 0 \\ -\sqrt{6} & 1 \end{bmatrix} \begin{bmatrix} 2\sqrt{6} & -\sqrt{6} \\ 0 & 1 \end{bmatrix} = B'B \quad \text{with}$$

$$
B = \frac{1}{\sqrt{8}} \begin{bmatrix} 2\sqrt{6} & -\sqrt{6} \\ 0 & 1 \end{bmatrix} \Rightarrow |B| = |B'| = |\Sigma\_{11}|^{-\frac{1}{2}} = \frac{2\sqrt{6}}{8}.$$

The bivariate normal density of *U*<sup>11</sup> is the following:

$$U\_{11} = \begin{bmatrix} \mu\_{12}(X) \\ \mu\_{13}(X) \end{bmatrix} \sim N\_2(\mu\_{(1)}, \Sigma\_{11}), \ \mu\_{(1)} = \begin{bmatrix} 7/6 \\ 4 \end{bmatrix},\tag{12.7.9}$$

with *Σ*<sup>11</sup> and *Σ*−<sup>1</sup> <sup>11</sup> = *B*- *B* as previously specified. Letting *Y* = *B(U*<sup>11</sup> − *E*[*U*11]*)*, *Y* ∼ *N*2*(O, I )*. Note that

$$\begin{aligned} Y &= \begin{bmatrix} y\_1 \\ y\_2 \end{bmatrix}, \ U\_{11} - E[U\_{11}] = \begin{bmatrix} u\_{12}(X) - E[u\_{12}(X)] \\ u\_{13}(X) - E[u\_{13}(X)] \end{bmatrix} = \begin{bmatrix} u\_{12}(X) - 7/6 \\ u\_{13}(X) - 4 \end{bmatrix}, \\\ y\_1 &= \frac{2\sqrt{6}}{\sqrt{8}}(u\_{12}(X) - 7/6) - \frac{\sqrt{6}}{\sqrt{8}}(u\_{13}(X) - 4), \\\ y\_2 &= \frac{\sqrt{6}}{\sqrt{8}}(u\_{13}(X) - 4). \end{aligned}$$

Then,

$$B^{-1} = \frac{\sqrt{8}}{2\sqrt{6}} \begin{bmatrix} 1 & \sqrt{6} \\ 0 & 2\sqrt{6} \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{3}} & \sqrt{2} \\ 0 & 2\sqrt{2} \end{bmatrix}.$$

and we have

$$
\begin{bmatrix} u\_{12}(X) - 7/6 \\ u\_{13}(X) - 4 \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{3}} & \sqrt{2} \\ 0 & 2\sqrt{2} \end{bmatrix} \begin{bmatrix} \mathbf{y}\_1 \\ \mathbf{y}\_2 \end{bmatrix},
$$

which yields *<sup>u</sup>*12*(X)* <sup>=</sup> <sup>7</sup> 6 + √ 1 <sup>3</sup> *<sup>y</sup>*<sup>1</sup> <sup>+</sup> <sup>√</sup><sup>2</sup> *<sup>y</sup>*<sup>2</sup> and *<sup>u</sup>*13*(X)* <sup>=</sup> <sup>4</sup> <sup>+</sup> <sup>2</sup> <sup>√</sup><sup>2</sup> *<sup>y</sup>*2. The intersection of the two lines corresponding to *u*12*(X)* = 0 and *u*13*(X)* = 0 is the point *(y*1*, y*2*)* = *(* <sup>√</sup>3*(* <sup>5</sup> <sup>6</sup> *),* − <sup>√</sup>2*)*. Thus, *<sup>u</sup>*12*(X)* <sup>≥</sup> 0 and *<sup>u</sup>*13*(X)* <sup>≥</sup> 0 give *<sup>y</sup>*<sup>2</sup> ≥ − <sup>4</sup> 2 <sup>√</sup><sup>2</sup> = −√2 and <sup>7</sup> 6 + √ 1 <sup>3</sup> *<sup>y</sup>*<sup>1</sup> <sup>+</sup> <sup>√</sup><sup>2</sup> *<sup>y</sup>*<sup>2</sup> <sup>≥</sup> 0. We can express the resulting probability as *<sup>ρ</sup>*<sup>1</sup> <sup>−</sup> *<sup>ρ</sup>*<sup>2</sup> where

$$\rho\_1 = \int\_{\mathbf{y}\_2 = -\sqrt{2}}^{\infty} \int\_{\mathbf{y}\_1 = -\infty}^{\infty} \frac{1}{2\pi} \mathbf{e}^{-\frac{1}{2}(\mathbf{y}\_1^2 + \mathbf{y}\_2^2)} \mathbf{d} \mathbf{y}\_1 \wedge \mathbf{d} \mathbf{y}\_2 = 1 - \Phi(-\sqrt{2}),\tag{12.7.10}$$

#### Classification Problems

which is explicitly available, where *Φ(*·*)* denotes the distribution function of a standard normal variable, and

$$\begin{split} \rho\_{2} &= \int\_{\mathbf{y}\_{1} = -\infty}^{\sqrt{\mathbf{\hat{}}}(\frac{\mathbf{\hat{}}}{\sqrt{2}})} \int\_{\mathbf{y}\_{2} = -\sqrt{2}}^{\frac{1}{\sqrt{2}}(\frac{\mathbf{\hat{}}}{\sqrt{2}} + \frac{1}{\sqrt{3}}\mathbf{\hat{y}\_{1}})} \frac{1}{2\pi} \mathbf{e}^{-\frac{1}{2}(\mathbf{y}\_{1}^{2} + \mathbf{y}\_{2}^{2})} \mathbf{d} \mathbf{y}\_{1} \wedge \mathbf{d} \mathbf{y}\_{2} \\ &= \int\_{\mathbf{y}\_{1} = -\infty}^{\sqrt{\mathbf{\hat{}}}(\frac{\mathbf{\hat{}}}{\sqrt{2}\pi)} \mathbf{e}^{-\frac{1}{2}\mathbf{y}\_{1}^{2}} [\Phi(-\frac{1}{\sqrt{2}}(\frac{\mathbf{\hat{}}}{\mathbf{\hat{}}} + \frac{1}{\sqrt{3}}\mathbf{y}\_{1})) - \Phi(-\sqrt{2})] d \mathbf{y}\_{1} \\ &= \int\_{\mathbf{y} = -\infty}^{\sqrt{\mathbf{\hat{}}}(\frac{\mathbf{\hat{}}}{\sqrt{2}\pi)} \mathbf{e}^{-\frac{1}{2}\mathbf{y}\_{1}^{2}} \Phi(-\frac{1}{\sqrt{2}}(\frac{\mathbf{\hat{}}}{\mathbf{\hat{}}} + \frac{1}{\sqrt{3}}\mathbf{y}\_{1})) \mathbf{d} \mathbf{y}\_{1} - \Phi(\sqrt{3}(\frac{\mathbf{\hat{}}}{\mathbf{\hat{}}})) \Phi(-\sqrt{2}). \end{split}$$

Therefore, the required probability is

$$\begin{split} \rho\_1 - \rho\_2 &= 1 - \Phi(-\sqrt{2}) + \Phi(-\sqrt{2})\Phi(\sqrt{3}(\frac{5}{6})) \\ &- \int\_{\mathbf{y}\_1 = -\infty}^{\sqrt{3}(\frac{5}{6})} \frac{1}{\sqrt{(2\pi)}} \mathbf{e}^{-\frac{1}{2}\mathbf{y}\_1^2} \Phi(-\frac{1}{\sqrt{2}}(\frac{7}{6} + \frac{1}{\sqrt{3}}\mathbf{y}\_1)) \,\mathrm{d}\mathbf{y}\_1. \end{split} \tag{12.7.12}$$

Note that all quantities, except the integral, are explicitly available from standard normal tables. The integral part can be read from a bivariate normal table. If a bivariate normal table is used, then one can approximate the required probability from (12.7.9). Alternatively, once evaluated numerically, the integral is found to be equal to 0.2182 which subtracted from 0.9941, yields a probability of 0.7759 for *P (*1|1*, A)*.

#### **12.7.3. Classification when the population parameters are unknown**

When training samples are available from the populations *πi, i* = 1*,... , k,* we can estimate the parameters and proceed with the classification. Let *X(i) <sup>j</sup> , j* = 1*,...,ni,* be a simple random sample of size *ni* from the *i*-th population *πi*. Then, the sample average is *<sup>X</sup>*¯ *(i)* <sup>=</sup> <sup>1</sup> *ni ni <sup>j</sup>*=<sup>1</sup> *<sup>X</sup>(i) <sup>j</sup>* , and with our usual notations, the sample matrix, the matrix of sample means and sample sum of products matrix are the following:

$$\begin{aligned} \mathbf{X}^{(\mathbf{i})} &= [X\_1^{(i)}, \dots, X\_{n\_i}^{(i)}], \ \bar{\mathbf{X}}^{(\mathbf{i})} = [\bar{X}^{(i)}, \dots, \bar{X}^{(i)}], \\ S\_i &= [\mathbf{X}^{(\mathbf{i})} - \bar{\mathbf{X}}^{(\mathbf{i})}][\mathbf{X}^{(\mathbf{i})} - \bar{\mathbf{X}}^{(\mathbf{i})}]', i = 1, \dots, k, \end{aligned}$$

where

$$\mathbf{X}^{(l)} = \begin{bmatrix} \mathbf{x}\_{11}^{(l)} & \dots & \mathbf{x}\_{1n\_i}^{(l)} \\ \vdots & \ddots & \vdots \\ \mathbf{x}\_{p1}^{(l)} & \dots & \mathbf{x}\_{pn\_i}^{(l)} \end{bmatrix}, \ S\_l = (\mathbf{s}\_{rl}^{(l)}), \ S\_{rl}^{(l)} = \sum\_{m=1}^{n\_i} (\mathbf{x}\_{rm}^{(l)} - \bar{\mathbf{x}}\_r^{(l)})(\mathbf{x}\_{tm}^{(l)} - \bar{\mathbf{x}}\_l^{(l)}), \ i = 1, \dots, k.$$

Note that **<sup>X</sup>***(***i***)* and **<sup>X</sup>**¯ *(***i***)* are *<sup>p</sup>*×*ni* matrices and *<sup>X</sup>(i) <sup>j</sup>* is a *p*×1 vector for each *j* = 1*,...,ni*, and *i* = 1*,...,k*. Let the population mean value vectors and the common covariance matrix be *μ(*1*) ,...,μ(k),* and *Σ>O*, respectively. Then, the unbiased estimators for these parameters are the following, identifying the estimators/estimates by a hat: *μ*ˆ *(i) <sup>j</sup>* <sup>=</sup> *<sup>X</sup>*¯ *(i), i* <sup>=</sup> <sup>1</sup>*,... , k,* and *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>S</sup> <sup>n</sup>*1+···+*nk*−*<sup>k</sup> , S* <sup>=</sup> *<sup>S</sup>*<sup>1</sup> +···+ *Sk*. On replacing the population parameters by their unbiased estimators, the classification criteria *uij (X), j* = 1*,... , k, j* = *i,* become the following: Classify an observation vector *X* into *πi* if *<sup>u</sup>*ˆ*ij (X)* <sup>≥</sup> ln *kij , kij* <sup>=</sup> *qjC(i*|*j ) qiC(j* <sup>|</sup>*i) , j* <sup>=</sup> <sup>1</sup>*,... , k, j* = *i,* or *u*ˆ*ij* ≥ 0*, j* = 1*,... , k, j* = *i*, if *q*<sup>1</sup> =···= *qk,* and the *C(i*|*j )*'s are equal *j* = 1*,... , k, j* = *i*, where

$$
\hat{u}\_{\bar{l}\bar{j}}(X) = (\bar{X}^{(\bar{l})} - \bar{X}^{(\bar{j})})' \hat{\Sigma}^{-1} X - \frac{1}{2} (\bar{X}^{(\bar{l})} - \bar{X}^{(\bar{j})})' \hat{\Sigma}^{-1} (\bar{X}^{(\bar{l})} + \bar{X}^{(\bar{j})}) \tag{12.7.13}
$$

for *j* = 1*,... , k, j* = *i*. Unfortunately, the exact distribution of *u*ˆ*ij (X)* is difficult to obtain even when the populations *πi*'s have *p*-variate normal distributions. However, when *nj* → ∞*, <sup>X</sup>*¯ *(j )* <sup>→</sup> *μ(j ), j* <sup>=</sup> <sup>1</sup>*,... , k,* and when *nj* → ∞*, j* <sup>=</sup> <sup>1</sup>*,... , k, <sup>Σ</sup>*<sup>ˆ</sup> <sup>→</sup> *<sup>Σ</sup>*. Then, asymptotically, that is, when *nj* → ∞*, j* = 1*,... , k, u*ˆ*ij (X)* → *uij (X),* so that the theory discussed in the previous sections is applicable. As well, the classification probabilities can then be evaluated as illustrated in Example 12.7.2.

### **12.8. The Maximum Likelihood Method when the Population Covariances Are Equal**

Consider *<sup>k</sup>* real normal populations *πi* : *Pi(X) Np(μ(i), Σ), Σ > O, i* <sup>=</sup> <sup>1</sup>*,... , k,* having the same covariance matrix but different mean value vectors *<sup>μ</sup>(i), i* <sup>=</sup> 1*,...,k*. A *p*-vector *X* at hand is to be classified into one of these populations *πj , j* = 1*,...,k*. Consider a simple random sample *X(i)* <sup>1</sup> *, X(i)* <sup>2</sup> *,...,X(i) ni* of sizes *ni* from *πi* for *i* = 1*,...,k*. Employing our usual notations, the sample means, sample matrices, matrices of sample means and the sample sum of products matrices are as follows:

$$\begin{split} \bar{X}^{(i)} &= \frac{1}{n\_{i}} \sum\_{j=1}^{n\_{i}} X\_{j}^{(i)}, \ \mathbf{X}^{(0)} = [X\_{1}^{(i)}, \dots, X\_{n\_{i}}^{(i)}] = \begin{bmatrix} x\_{11}^{(i)} & \dots & x\_{1n\_{i}}^{(i)} \\ \vdots & \ddots & \vdots \\ x\_{p1}^{(i)} & \dots & x\_{pn\_{i}}^{(i)} \end{bmatrix}, \\ \bar{\mathbf{X}}^{(\bar{0})} &= [\bar{X}^{(i)}, \dots, \bar{X}^{(i)}], \ S^{(i)} = [\mathbf{X}^{(0)} - \bar{\mathbf{X}}^{(0)}][\mathbf{X}^{(0)} - \bar{\mathbf{X}}^{(0)}], \\ \mathbf{S}^{(i)} &= (s\_{rl}^{(i)}), \ s\_{rl}^{(l)} = \sum\_{m=1}^{n\_{i}} (x\_{rm}^{(l)} - \bar{x}\_{r}^{(i)})(x\_{tm}^{(l)} - \bar{x}\_{l}^{(i)}), \ S = S^{(1)} + S^{(2)} + \dots + S^{(k)}. \end{split} \tag{12.8.1}$$

Then, the unbiased estimators of the population parameters, denoted with a hat, are

$$
\hat{\mu}^{(i)} = \bar{X}^{(i)}, \ i = 1, \ldots, k, \quad \text{and} \quad \hat{\Sigma} = \frac{S}{n\_1 + n\_2 + \cdots + n\_k - k}. \tag{12.8.2}
$$

The null hypothesis can be taken as *X(i)* <sup>1</sup> *,...,X(i) ni* and *X* originating from *πi* and *X(j )* <sup>1</sup> *,...,X(j ) nj* coming from *πj , j* = 1*,... , k, j* = *i*, the alternative hypothesis being: *X* and *X(j )* <sup>1</sup> *,...,X(j ) nj* coming from *πj* for *j* = 1*,... , k, j* <sup>=</sup> *i,* and *<sup>X</sup>(i)* <sup>1</sup> *,...,X(i) ni* originating from *πi*. On proceeding as in Sect. 12.6, when the prior probabilities are equal and the cost functions are identical, the criterion for classification of the observed vector *X* to *πi* for a specific *i* is

$$\begin{split} A\_{\bar{l}} : & \left( \frac{n\_{\bar{j}}}{n\_{\bar{j}} + 1} \right)^{2} (X - \bar{X}^{(j)})' \left( \frac{S}{n\_{\bar{l}(k)}} \right)^{-1} (X - \bar{X}^{(j)}) \\ & - \left( \frac{n\_{\bar{i}}}{n\_{\bar{i}} + 1} \right)^{2} (X - \bar{X}^{(i)})' \left( \frac{S}{n\_{\bar{l}(k)}} \right)^{-1} (X - \bar{X}^{(i)}) \ge 0 \end{split} \tag{12.8.3}$$

for *j* = 1*,... , k, j* <sup>=</sup> *i,* where the decision rule is *<sup>A</sup>* <sup>=</sup> *(A*1*,...,Ak)*, *<sup>S</sup>* <sup>=</sup> *<sup>S</sup>(*1*)* +· · ·+*S(k)* and *n(k)* = *n*<sup>1</sup> + *n*<sup>2</sup> +···+ *nk* − *k*. Note that (12.8.3) holds for each *i*, *i* = 1*,... , k,* and hence, *A*1*,...,Ak* are available from (12.8.3). Thus, the vector *X* at hand is classified into *Ai*, that is, assigned to the population *πi,* if the inequalities in (12.8.3) are satisfied. This statement holds for each *i*, *i* = 1*,...,k*. The exact distribution of the criterion in (12.8.3) is difficult to establish but the probabilities of classification can be computed from the asymptotic theory discussed in Sect. 12.7 by observing the following:

When *ni* → ∞*, <sup>X</sup>*¯ *(i)* <sup>→</sup> *μ(i), i* <sup>=</sup> <sup>1</sup>*,... , k,* and when *<sup>n</sup>*<sup>1</sup> → ∞*,...,nk* <sup>→</sup> ∞*, Σ*ˆ → *Σ*. Thus, asymptotically, when *ni* → ∞*, i* = 1*,... , k,* the criterion specified in (12.8.3) reduces to the criterion (12.7.3) of Sect. 12.7. Accordingly, when *ni* → ∞ or for very large *ni*'s, *i* = 1*,... , k,* one may utilize (12.7.3) for computing the probabilities of classification, which was illustrated in Examples 12.7.1 and 12.7.2.

#### **12.9. Maximum Likelihood Method and Unequal Covariance Matrices**

The likelihood procedure can also provide a classification rule when the normal population covariance matrices are different. For example, let *<sup>π</sup>*<sup>1</sup> : *<sup>P</sup>*1*(X) Np(μ(*1*) , Σ*1*)*, *<sup>Σ</sup>*<sup>1</sup> *> O,* and *<sup>π</sup>*<sup>2</sup> : *<sup>P</sup>*2*(X) Np(μ(*2*) , Σ*2*), Σ*<sup>2</sup> *> O,* where *μ(*1*)* <sup>=</sup> *μ(*2*)* and *<sup>Σ</sup>*<sup>1</sup> = *Σ*2. Let a simple random sample *X(*1*)* <sup>1</sup> *,...,X(*1*) <sup>n</sup>*<sup>1</sup> of size *<sup>n</sup>*<sup>1</sup> from *<sup>π</sup>*<sup>1</sup> and a simple random sample *X(*2*)* <sup>1</sup> *,...,X(*2*) <sup>n</sup>*<sup>2</sup> of size *<sup>n</sup>*<sup>2</sup> from *<sup>π</sup>*<sup>2</sup> be available. Let *<sup>X</sup>*¯ *(*1*)* and *<sup>X</sup>*¯ *(*2*)* be the sample averages and *S*<sup>1</sup> and *S*<sup>2</sup> be the sample sum of products matrices, respectively. In classification problems, there is an additional vector *X* which comes from *π*<sup>1</sup> under the null hypothesis and from *π*<sup>2</sup> under the alternative. Then, the maximum likelihood estimators, denoted by a hat, will be the following:

$$
\hat{\mu}^{(1)} = \bar{X}^{(1)}, \ \hat{\mu}^{(2)} = \bar{X}^{(2)}, \ \hat{\Sigma}\_1 = \frac{S\_1}{n\_1} \text{ and } \ \hat{\Sigma}\_2 = \frac{S\_2}{n\_2}, \tag{i}
$$

respectively, when no additional vector is involved. However, these estimators will change in the presence of the additional vector *X*, where *X* is the vector at hand to be assigned to *π*<sup>1</sup> or *π*2. When *X* originates from *π*<sup>1</sup> or *π*2, *μ(*1*)* and *μ(*2*)* are respectively estimated as follows:

$$
\hat{\mu}\_{\*}^{(1)} = \frac{n\_1 \bar{X}\_1 + X}{n\_1 + 1} \quad \text{and} \quad \hat{\mu}\_{\*}^{(2)} = \frac{n\_2 \bar{X}\_2 + X}{n\_2 + 1}, \tag{ii}
$$

and when *X* comes from *π*<sup>1</sup> or *π*2, *Σ*<sup>1</sup> and *Σ*<sup>2</sup> are estimated by

$$
\hat{\Sigma}\_{1\*} = \frac{S\_1 + S\_3^{(1)}}{n\_1 + 1} \text{ and } \hat{\Sigma}\_{2\*} = \frac{S\_2 + S\_3^{(2)}}{n\_2 + 1} \tag{iii}
$$

where

$$S\_3^{(1)} = (X - \hat{\mu}\_\*^{(1)})(X - \hat{\mu}\_\*^{(1)})' = \left(\frac{n\_1}{n\_1 + 1}\right)^2 (X - \bar{X}\_1)(X - \bar{X}\_1)'$$

$$S\_3^{(2)} = (X - \hat{\mu}\_\*^{(2)})(X - \hat{\mu}\_\*^{(2)})' = \left(\frac{n\_2}{n\_2 + 1}\right)^2 (X - \bar{X}\_2)(X - \bar{X}\_2)',\qquad(iv)$$

referring to the derivations provided in Sect. 12.6 when discussing maximum likelihood procedures. Thus, the null hypothesis can be *X* and *X(*1*)* <sup>1</sup> *,...,X(*1*) <sup>n</sup>*<sup>1</sup> are from *<sup>π</sup>*<sup>1</sup> and *X(*2*)* <sup>1</sup> *,...,X(*2*) <sup>n</sup>*<sup>2</sup> are from *<sup>π</sup>*2, versus the alternative: *<sup>X</sup>* and *<sup>X</sup>(*2*)* <sup>1</sup> *,...,X(*2*) <sup>n</sup>*<sup>2</sup> being from *<sup>π</sup>*<sup>2</sup> and *X(*1*)* <sup>1</sup> *,...,X(*1*) <sup>n</sup>*<sup>1</sup> , from *<sup>π</sup>*1. Let *<sup>L</sup>*<sup>0</sup> and *<sup>L</sup>*<sup>1</sup> denote the likelihood functions under the null and alternative hypotheses, respectively. Observe that under the null hypothesis, *Σ*<sup>1</sup> is estimated by *Σ*ˆ <sup>1</sup><sup>∗</sup> of *(iii)* and *Σ*<sup>2</sup> is estimated by *Σ*ˆ of *(i)*, respectively, so that the likelihood ratio criterion *λ* is given by

$$\lambda = \frac{\max L\_0}{\max L\_1} = \frac{|\hat{\Sigma}\_{2\*}|^{\frac{n\_2+1}{2}} |\hat{\Sigma}\_1|^{\frac{n\_1}{2}}}{|\hat{\Sigma}\_{1\*}|^{\frac{n\_1+1}{2}} |\hat{\Sigma}\_2|^{\frac{n\_2}{2}}}.\tag{12.9.1}$$

The determinants in (12.9.1) can be represented as follows, referring to the simplifications discussed in Sect. 12.6:

$$\lambda = \frac{(n\_1+1)^{\frac{p(n\_1+1)}{2}} |S\_2|^{\frac{n\_2+1}{2}}}{(n\_2+1)^{\frac{p(n\_2+1)}{2}} |S\_1|^{\frac{n\_1+1}{2}}} \frac{[1+(\frac{n\_2}{n\_2+1})^2(X-\bar{X}\_2)'S\_2^{-1}(X-\bar{X}\_2)]^{\frac{n\_2+1}{2}} |\hat{\Sigma}\_1|^{\frac{n\_1}{2}}}{[1+(\frac{n\_1}{n\_1+1})^2(X-\bar{X}\_1)'S\_1^{-1}(X-\bar{X}\_1)]^{\frac{n\_1+1}{2}} |\hat{\Sigma}\_2|^{\frac{n\_2}{2}}}. \tag{12.9.2}$$

The classification rule then consists of assigning the observed vector *X* to *π*<sup>1</sup> if *λ* ≥ 1 and, to *π*<sup>2</sup> if *λ <* 1. We could have expressed the criterion in terms of *λ*<sup>1</sup> = *λ* 2 *<sup>n</sup>* if *n*<sup>1</sup> = *n*<sup>2</sup> = *n*, which would have simplified the expressions appearing in (12.9.2).

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 13 Multivariate Analysis of Variation**

#### **13.1. Introduction**

We will employ the same notations as in the previous chapters. Lower-case letters *x, y,...* will denote real scalar variables, whether mathematical or random. Capital letters *X, Y, . . .* will be used to denote real matrix-variate mathematical or random variables, whether square or rectangular matrices are involved. A tilde will be placed on top of letters such as *x,*˜ *y,*˜ *X,* ˜ *Y*˜ to denote variables in the complex domain. Constant matrices will for instance be denoted by *A, B, C*. A tilde will not be used on constant matrices unless the point is to be stressed that the matrix is in the complex domain. The determinant of a square matrix *A* will be denoted by |*A*| or det*(A)* and, in the complex case, the absolute value or modulus of the determinant of *A* will be denoted as |det*(A)*|. When matrices are square, their order will be taken as *p* × *p*, unless specified otherwise. When *A* is a full rank matrix in the complex domain, then *AA*∗ is Hermitian positive definite where an asterisk designates the complex conjugate transpose of a matrix. Additionally, d*X* will indicate the wedge product of all the distinct differentials of the elements of the matrix *X*. Thus, letting the *p* × *q* matrix *X* = *(xij )* where the *xij* 's are distinct real scalar variables, <sup>d</sup>*<sup>X</sup>* = ∧*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> <sup>∧</sup>*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>d</sup>*xij* . For the complex matrix *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup> *iX*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, where *<sup>X</sup>*<sup>1</sup> and *X*<sup>2</sup> are real, d*X*˜ = d*X*<sup>1</sup> ∧ d*X*2.

In this chapter, we only consider analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA) problems involving real populations. Even though all the steps involved in the following discussion focusing on the real variable case can readily be extended to the complex domain, it does not appear that a parallel development of analysis of variance methodologies in the complex domain has yet been considered. In order to elucidate the various steps in the procedures, we will first review the univariate case. For a detailed exposition of the analysis of variance technique in the scalar variable case, the reader may refer Mathai and Haubold (2017). We will consider the cases of one-way classification or completely randomized design as well as two-way classification with-

A. M. Mathai et al., *Multivariate Statistical Analysis in the Real and Complex Domains*, https://doi.org/10.1007/978-3-030-95864-0 13

out and with interaction or randomized block design. With this groundwork in place, the derivations of the results in the multivariate setting ought to prove easier to follow.

In the early nineteenth century, Gauss and Laplace utilized methodologies that may be regarded as forerunners to ANOVA in their analyses of astronomical data. However, this technique came to full fruition in Ronald Fisher's classic book titled "Statistical Methods for Research Workers", which was initially published in 1925. The principle behind ANOVA consists of partitioning the total variation present in the data into variations attributable to different sources. It is actually the total variation that is split rather than the total variance, the latter being a fraction of the former. Accordingly, the procedure could be more appropriately referred to as "analysis of variation". As has already been mentioned, we will initially consider the one-way classification model, which will then be extended to the multivariate situation.

Let us first focus on an experimental design called a completely randomized experiment. In this setting, the subject matter was originally developed for agricultural experiments, which influenced its terminology. For example, the basic experimental unit is referred to as a "plot", which is a piece of land in an agricultural context. When an experiment is performed on human beings, a plot translates into an individual. If the experiment is carried out on some machinery, then a machine corresponds to a plot. In a completely randomized experiment, a set of *n*<sup>1</sup> + *n*<sup>2</sup> +···+ *nk* plots, which are homogeneous with respect to all factors of variation, are selected. Then, *k* treatments are applied at random to these plots, the first treatment to *n*<sup>1</sup> plots, the second treatment to *n*<sup>2</sup> plots, up to the *k*-th treatment being applied to *nk* plots. For instance, if the effects of *k* different fertilizers on the yield of a certain crop are to be studied, then the treatments consist of these *k* fertilizers, the first treatment meaning one of the fertilizers, the second treatment, another one and so on, with the *k*-th treatment corresponding to the last fertilizer. If the experiment involves studying the yield of corn among *k* different varieties of corn, then a treatment coincides with a particular variety. If an experiment consists of comparing *k* teaching methods, then a treatment refers to a method of teaching and a plot corresponds to a student. When an experiment compares the effect of *k* different medications in curing a certain ailment, then a treatment is a medication, and so on. If the treatments are denoted by *t*1*,...,tk*, then treatment *tj* is applied at random to *nj* homogeneous plots or *nj* homogeneous plots are selected at random and treatment *tj* is applied to them, for *j* = 1*,...,k*. Random assignment is done to avoid possible biases or the influence of confounding factors, if any. Then, observations measuring the effect of these treatments on the experimental units are made. For example, in the case of various methods of teaching, the observation *xij* could be the final grade obtained by the *j* -th student who was subjected to the *i*-th teaching method. In the case of comparing *k* different varieties of corn, the observation *xij* could consist of the yield of corn observed at harvest time in the *j* -th plot which received the *i*-th variety of corn. Thus, in this instance, *i* stands for the treatment number and *j* represents the serial number of the plot receiving the *i*-th treatment, *xij* being the final observation. Then, the corresponding linear additive fixed effect model is the following:

$$\mathbf{x}\_{ij} = \mu + \alpha\_i + e\_{ij}, \ j = 1, \ldots, n\_i, \ i = 1, \ldots, k,\tag{13.1.1}$$

where *μ* is a general effect, *αi* is the deviation from the general effect due to treatment *ti* and *eij* is the random component, which includes the sum total contributions originating from unknown or uncontrolled factors. When the experiment is designed, the plots are selected so that they be homogeneous with respect to all possible factors of variation. The general effect *μ* can be interpreted as the grand average or the expected value of *xij* when *αi* is not present or treatment *ti* is not applied or has no effect. The simplest assumption that we will make is that *<sup>E</sup>*[*eij* ] = 0 for all *<sup>i</sup>* and *<sup>j</sup>* and Var*(eij )* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>gt;* 0 for all *<sup>i</sup>* and *<sup>j</sup>* and for some positive quantity *<sup>σ</sup>*2, where *<sup>E</sup>*[·] denotes the expected value of [·]. It is further assumed that *μ, α*1*,...,αk* are all unknown constants. When *α*1*,...,αk* are assumed to be random variables, model (13.1.1) is referred to as a"random effect model". In the following discussion, we will solely consider fixed effect models. The first step consists of estimating the unknown quantities from the data. Since no distribution is assumed on the *eij* 's, and thereby on the *xij* 's, we will employ the method of least squares for estimating the parameters. In that case, one has to minimize the error sum of squares which is

$$\sum\_{ij} e\_{ij}^2 = \sum\_{ij} [x\_{ij} - \mu - \alpha\_i]^2.$$

Applying calculus principles, we equate the partial derivatives of *ij e*<sup>2</sup> *ij* with respect to *μ* to zero and then, equate the partial derivatives of *ij e*<sup>2</sup> *ij* with respect to *α*1*,...,αk* to zero and solve these equations. A convenient notation in this area is to represent a summation by a dot. As an example, if the subscript *j* is summed up, it is replaced by a dot, so that *<sup>j</sup> xij* ≡ *xi.* ; similarly, *ij xij* ≡ *x..* . Thus,

$$\frac{\partial}{\partial \mu} \left[ \sum\_{ij} e\_{ij}^2 \right] = 0 \Rightarrow -2 \sum\_{ij} [\mathbf{x}\_{ij} - \mu - a\_i] = 0 \Rightarrow \sum\_{i} \left( \sum\_{j} [\mathbf{x}\_{ij} - \mu - a\_i] \right) = 0$$

that is,

$$\sum\_{i} [\mathbf{x}\_{i.} - n\_i \mu - n\_i \alpha\_i] = 0 \Rightarrow \mathbf{x}\_{..} - n\_{..} \mu - \sum\_{i=1}^{k} n\_i \alpha\_i = 0,$$

and since we have taken *αi* as a deviation from the general effect due to treatment *ti,* we can let *<sup>i</sup> niαi* = 0 without any loss of generality. Then, *x../n.* is an estimate of *μ*, and denoting estimates/estimators by a hat, we write *μ*ˆ = *x../n.*. Now, note that for example *<sup>α</sup>*<sup>1</sup> appears in the terms *(x*<sup>11</sup> <sup>−</sup>*μ*−*α*1*)*<sup>2</sup> +···+*(x*1*n*<sup>1</sup> <sup>−</sup>*μ*−*α*1*)*<sup>2</sup> <sup>=</sup> *<sup>j</sup> (x*1*<sup>j</sup>* <sup>−</sup>*μ*−*α*1*)*<sup>2</sup> but does not appear in the other terms in the error sum of squares. Accordingly, for a specific *<sup>i</sup>*, *<sup>∂</sup>*

$$\frac{\partial}{\partial \alpha\_i} \left[ \sum\_{ij} e\_{ij}^2 \right] = 0 \Rightarrow \sum\_j [\mathbf{x}\_{ij} - \mu - \alpha\_i] = 0 \Rightarrow \mathbf{x}\_{i.} - n\_i \hat{\mu} - n\_i \hat{\alpha}\_i = 0,$$

that is, *<sup>α</sup>*ˆ*<sup>i</sup>* <sup>=</sup> *xi. ni* − ˆ*μ*. Thus,

$$
\hat{\mu} = \frac{1}{n\_{\cdot}} \mathbf{x}\_{\cdot \cdot} \text{ and } \hat{\alpha}\_{l} = \frac{1}{n\_{l}} \mathbf{x}\_{l \cdot} - \hat{\mu} \,. \tag{13.1.2}
$$

The least squares minimum is obtained by substituting the least squares estimates of *μ* and *αi, i* <sup>=</sup> <sup>1</sup>*,... , k,* in the error sum of squares. Denoting the least squares minimum by *<sup>s</sup>*2,

$$\begin{split} s^2 &= \sum\_{ij} (\mathbf{x}\_{ij} - \hat{\boldsymbol{\mu}} - \hat{\boldsymbol{\alpha}}\_{i})^2 = \sum\_{ij} \left[ \mathbf{x}\_{ij} - \frac{\boldsymbol{\underline{x}}\_{\cdot}}{n\_{\cdot}} - \left( \frac{\mathbf{x}\_{i}}{n\_{i}} - \frac{\mathbf{x}\_{\cdot}}{n\_{\cdot}} \right) \right]^2 \\ &= \sum\_{ij} \left[ \mathbf{x}\_{ij} - \frac{\mathbf{x}\_{i}}{n\_{i}} \right]^2 = \sum\_{ij} \left[ \mathbf{x}\_{ij} - \frac{\mathbf{x}\_{\cdot}}{n\_{\cdot}} \right]^2 - \sum\_{ij} \left[ \frac{\mathbf{x}\_{i\_{\cdot}}}{n\_{i}} - \frac{\mathbf{x}\_{\cdot}}{n\_{\cdot}} \right]^2. \end{split} \tag{13.1.3}$$

When the square is expanded, the middle term will become −2 *ij ( xi. ni* <sup>−</sup> *x.. n. )*2, thus yielding the expression given in (13.1.3). As well, we have the following identity:

$$\sum\_{ij} \left( \frac{\mathbf{x}\_{i\cdot}}{n\_i} - \frac{\mathbf{x}\_{\cdot\cdot}}{n\_{\cdot\cdot}} \right)^2 = \sum\_{i=1}^k n\_i \left( \frac{\mathbf{x}\_{i\cdot}}{n\_i} - \frac{\mathbf{x}\_{\cdot\cdot}}{n\_{\cdot\cdot}} \right)^2 = \sum\_{i} \frac{\mathbf{x}\_{i\cdot}^2}{n\_i} - \frac{\mathbf{x}\_{\cdot\cdot}^2}{n\_{\cdot\cdot}}.$$

Now, let us consider the hypothesis *Ho* : *α*<sup>1</sup> = *α*<sup>2</sup> = ··· = *αk*, which is equivalent to the hypothesis *α*<sup>1</sup> = *α*<sup>2</sup> = ··· = *αk* = 0 since, by assumption, *<sup>i</sup> niαi* = 0. Proceeding as before, the least squares minimum, under the null hypothesis *Ho*, denoted by *s*<sup>2</sup> <sup>0</sup> , is the following:

$$s\_0^2 = \sum\_{ij} \left( x\_{ij} - \frac{x\_{\dots}}{n\_{\cdot}} \right)^2$$

and hence the sum of squares due to the hypothesis or due to the presence of the *αj* 's, is given by *s*<sup>2</sup> <sup>0</sup> <sup>−</sup> *<sup>s</sup>*<sup>2</sup> <sup>=</sup> *ij ( xi. ni* <sup>−</sup> *x.. n. )*2. Thus, the total variation is partitioned as follows:

$$s\_0^2 = \left[s\_0^2 - s^2\right] + \left[s^2\right]$$

$$\sum\_{ij} \left(\mathbf{x}\_{ij} - \frac{\mathbf{x}\_{..}}{n\_{.}}\right)^2 = \left[\sum\_{i} n\_i \left(\frac{\mathbf{x}\_{i.}}{n\_i} - \frac{\mathbf{x}\_{..}}{n\_{.}}\right)^2\right] + \left[\sum\_{ij} \left(\mathbf{x}\_{ij} - \frac{\mathbf{x}\_{i.}}{n\_i}\right)^2\right], \text{ that is,}$$

Total variation (*s*<sup>2</sup> <sup>0</sup> )=variation due to the *αj* 's (*s*<sup>2</sup> <sup>0</sup> <sup>−</sup> *<sup>s</sup>*2)+the residual variation (*s*2)*,* which is the analysis of variation principle. If *eij iid* <sup>∼</sup> *<sup>N</sup>*1*(*0*, σ*2*)* for all *<sup>i</sup>* and *<sup>j</sup>* where *σ*<sup>2</sup> *>* 0 is a constant, it follows from the chisquaredness and independence of quadratic forms, as discussed in Chaps. <sup>2</sup> and 3, that *<sup>s</sup>*<sup>2</sup> 0 *<sup>σ</sup>*<sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *n.*−1, a real chisquare variable having *n.* <sup>−</sup> 1 degrees of freedom, [*s*<sup>2</sup> <sup>0</sup>−*s*2] *<sup>σ</sup>*<sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>k</sup>*−<sup>1</sup> under the hypothesis *Ho* and *<sup>s</sup>*<sup>2</sup> *<sup>σ</sup>*<sup>2</sup> <sup>∼</sup> *<sup>χ</sup>*<sup>2</sup> *n.*−*k,* where the sum of squares due to the *αj* 's, namely *s*<sup>2</sup> <sup>0</sup> <sup>−</sup>*s*2*,* and the residual sum of squares, namely *s*2*,* are independently distributed under the hypothesis. Usually, these findings are put into a tabular form known as the analysis of variation table or ANOVA table. The usual format is as follows:


ANOVA Table for the One-Way Classification

where *df* denotes the number of degrees of freedom, *SS* means sum of squares and *MS* stands for mean squares or the average of the squares. There is usually a last column which contains the F-ratio, that is, the ratio of the treatments MS to the residuals MS, and enables one to determine whether to reject the null hypothesis, in which case the test statistic is said to be "significant", or not to reject the null hypothesis, when the test statistic is "not significant". Further details on the real scalar variable case are available from Mathai and Haubold (2017).

In light of this brief review of the scalar variable case of one-way classification data or univariate data secured from a completely randomized design, the concepts will now be extended to the multivariate setting.

#### **13.2. Multivariate Case of One-Way Classification Data Analysis**

Extension of the results to the multivariate case is parallel to the scalar variable case. Consider a model of the type

$$X\_{ij} = M + A\_i + E\_{ij}, \; j = 1, \ldots, n\_i, \; i = 1, \ldots, k,\tag{13.2.1}$$

with *Xij , M, Ai* and *Eij* all being *p* × 1 real vectors where *Xij* denotes the *j* -th observation vector in the *i*-th group or the observed vectors obtained from the *ni* plots receiving the *i*-th vector of treatments, *M* is a general effect vector, *Ai* is a vector of deviations from *M* due to the *i*-th treatment vector so that *<sup>i</sup> niAi* = *O* since we are taking deviations from the general effect *M*, and *Eij* is a vector of random components assumed to be normally distributed as follows: *Eij iid* ∼ *Np(O, Σ), Σ > O,* for all *i* and *j* where *Σ* is a positive definite covariance matrix, that is,

$$\text{Cov}(E\_{ij}) = E[(E\_{ij} - O)(E\_{ij} - O)'] = E[E\_{ij}E\_{ij}'] = \Sigma > O \text{ for all } i \text{ and } j,$$

where *E*[·] denotes the expected value operator. This normality assumption will be needed for testing hypotheses and developing certain distributional aspects. However, the multivariate analysis of variation can be set up without having to resort to any distributional assumption. In the real scalar variable case, we minimized the sum of the squares of the errors since the variations only involved single scalar variables. In the vector case, if we take the sum of squares of the elements in *Eij* , that is, *E*- *ijEij* and its sum over all *i* and *j* , then we are only considering the variations in the individual elements of *Eij* 's; however, in the vector case, there is joint variation among the elements of the vector and that is also to be taken into account. Hence, we should be considering all squared terms and cross product terms or the whole matrix of squared and cross product terms. This is given by *EijE*- *ij* and so, we should consider this matrix and carry out some type of minimization. Consider

$$\sum\_{ij} E\_{ij} E'\_{ij} = \sum\_{ij} [X\_{ij} - M - A\_i][X\_{ij} - M - A\_i]'.\tag{13.2.2}$$

 For obtaining estimates of *M* and *Ai, i* = 1*,...,k*, we will minimize the trace of *ij EijE*- *ij* as a criterion. There are terms of the type [*Xij* − *M* − *Ai*] - [*Xij* − *M* − *Ai*] in this trace. Thus,

$$\frac{\partial}{\partial M} \left[ \text{tr} \left( \sum\_{ij} E\_{ij} E\_{ij}' \right) \right] = \sum\_{ij} \frac{\partial}{\partial M} [X\_{ij} - M - A\_i]' [X\_{ij} - M - A\_i] = O$$

$$\Rightarrow \sum\_{ij} X\_{ij} - n\_i M - \sum\_i n\_i A\_i = O \Rightarrow \hat{M} = \frac{1}{n\_{\dots}} X\_{\dots}$$

noting that we assumed that *<sup>i</sup> niAi* = *O*. Now, on differentiating the trace of *EijE*- *ij* with respect to *Ai* for a specific *i*, we have

$$\frac{\partial}{\partial A\_i} \text{tr} \left[ \sum\_{ij} E\_{ij} E\_{ij}' \right] = \frac{\partial}{\partial A\_i} \sum\_{ij} [X\_{ij} - M - A\_i]' [X\_{ij} - M - A\_i] = O$$

$$\Rightarrow \sum\_j [X\_{ij} - M - A\_i] = O \Rightarrow \hat{A}\_i = \frac{1}{n\_i} X\_{i.} - \hat{M} = \frac{1}{n\_i} X\_{i.} - \frac{1}{n\_{..}} X\_{...}$$

#### Multivariate Analysis of Variation

Observe that there is only one critical vector for *M*ˆ and for *A*ˆ*i, i* = 1*,...,k*. Accordingly, the critical point will either correspond to a minimum or a maximum of the trace. But for arbitrary *M* and *Ai*, the maximum occurs at plus infinity and hence, the critical point *(M,* ˆ *A*ˆ*i, i* = 1*,... , k)* corresponds to a minimum. Once evaluated at these estimates, the sum of squares and cross products matrix, denoted by *S*, is the following:

$$S = \sum\_{ij} [X\_{ij} - \hat{M} - \hat{A}\_{i}][X\_{ij} - \hat{M} - \hat{A}\_{i}]' = \sum\_{ij} \left[X\_{ij} - \frac{1}{n\_{i}}X\_{i.}\right] \left[X\_{ij} - \frac{1}{n\_{i}}X\_{i.}\right]'$$

$$= \sum\_{ij} \left[X\_{ij} - \frac{1}{n\_{.}}X\_{..}\right] \left[X\_{ij} - \frac{1}{n\_{.}}X\_{..}\right]' - \sum\_{i} n\_{i} \left[\frac{1}{n\_{i}}X\_{i.} - \frac{1}{n\_{.}}X\_{..}\right] \left[\frac{1}{n\_{i}}X\_{i.} - \frac{1}{n\_{.}}X\_{..}\right]' \tag{13.2.3}$$

Note that as in the scalar case, the middle terms and the last term will combine into the second term above. Now, let us impose the hypothesis *Ho* : *A*<sup>1</sup> = *A*<sup>2</sup> = ··· = *Ak* = *O*. Note that equality of the *Aj* 's will automatically imply that each one is null because the weighted sum is null as per our initial assumption in the model (13.2.1). Under this hypothesis, the model will be *Xij* = *M* + *Eij* , and then proceeding as in the univariate case, we end up with the following sum of squares and cross products matrix, denoted by *S*0:

$$S\_0 = \sum\_{ij} \left[ X\_{ij} - \frac{1}{n\_{\cdot}} X\_{\cdot \cdot} \right] \left[ X\_{ij} - \frac{1}{n\_{\cdot}} X\_{\cdot \cdot} \right]',\tag{13.2.4}$$

so that the sum of squares and cross products matrix due to the *Ai*'s is the difference

$$S\_0 - S = \sum\_{ij} \left[ \frac{1}{n\_i} X\_{i.} - \frac{1}{n\_{.}} X\_{..} \right] \left[ \frac{1}{n\_i} X\_{i.} - \frac{1}{n\_{.}} X\_{..} \right]'.\tag{13.2.5}$$

Thus, the following partitioning of the total variation in the multivariate data:

$$\mathcal{S}\_0 = \{\mathcal{S}\_0 - \mathcal{S}\} + \mathcal{S}\_1$$

Total variation = [Variation due to the *Ai*'s] + [Residual variation]

$$\begin{split} \sum\_{ij} \left[ X\_{ij} - \frac{1}{n\_{\cdot}} X\_{\cdot \cdot} \right] \left[ X\_{ij} - \frac{1}{n\_{\cdot}} X\_{\cdot \cdot} \right]' &= \sum\_{ij} \left[ \frac{1}{n\_{i}} X\_{i.} - \frac{1}{n\_{\cdot}} X\_{\cdot \cdot} \right] \left[ \frac{1}{n\_{i}} X\_{i.} - \frac{1}{n\_{\cdot}} X\_{\cdot \cdot} \right]' \\ &+ \sum\_{ij} \left[ X\_{ij} - \frac{1}{n\_{i}} X\_{i.} \right] \left[ X\_{ij} - \frac{1}{n\_{i}} X\_{i.} \right]'. \end{split}$$

Under the normality assumption for the random component *Eij iid* ∼ *Np(O, Σ), Σ > O,* we have the following properties, which follow from results derived in Chap. 5, the notation *Wp(ν, Σ)* standing for a Wishart distribution having *ν* degrees of freedom and parameter matrix *Σ*:

$$\begin{aligned} \text{Total variation} &= S\_0 = \sum\_{ij} \left[ X\_{ij} - \frac{1}{n\_{\cdot}} X\_{\cdot \cdot} \right] \left[ X\_{ij} - \frac{1}{n\_{\cdot}} X\_{\cdot \cdot} \right]', \\ &S\_0 \sim W\_p(n\_{\cdot} - 1, \Sigma); \\ \text{Variation due to the } A\_i \text{'s} = S\_0 - S = \sum\_{ij} \left[ \frac{1}{n\_{i\cdot}} X\_{i\cdot} - \frac{1}{n\_{\cdot}} X\_{\cdot \cdot} \right] \left[ \frac{1}{n\_{i\cdot}} X\_{i\cdot} - \frac{1}{n\_{\cdot}} X\_{\cdot \cdot} \right]', \\ &S\_0 - S \sim W\_p(k - 1, \Sigma) \text{ under the hypothesis } A\_1 = A\_2 = \dots = A\_k = O; \\ \text{Residual variation} &= S = \sum\_{ij} \left[ X\_{ij} - \frac{1}{n\_{i\cdot}} X\_{i\cdot} \right] \left[ X\_{ij} - \frac{1}{n\_{i\cdot}} X\_{i\cdot} \right]', \\ &S \sim W\_p(n\_{\cdot} - k, \Sigma). \end{aligned}$$

We can summarize these findings in a tabular form known as the multivariate analysis of variation table or MANOVA table, where *df* means degrees of freedom in the corresponding Wishart distribution, and *SSP* represents the sum of squares and cross products matrix.


Multivariate Analysis of Variation (MANOVA) Table

#### **13.2.1. Some properties**

The sample values from the *i*-th sample or the *i*-th group or the plots receiving the *i*-th treatment are *Xi*1*, Xi*2*,...,Xini* . In this case, the average is *ni j*=1 *Xij ni* <sup>=</sup> *Xi. ni* and the *i*-th sample sum of squares and products matrix is

$$S\_{\bar{i}} = \sum\_{j=1}^{n\_{\bar{i}}} \left( X\_{i\bar{j}} - \frac{X\_{\bar{i}.}}{n\_{\bar{i}}} \right) \left( X\_{i\bar{j}} - \frac{X\_{\bar{i}.}}{n\_{\bar{i}}} \right)'.$$

As well, it follows from Chap. 5 that *Si* ∼ *Wp(ni* − 1*,Σ)* when *Eij iid* ∼ *Np(O, Σ), Σ > O*. Then, the residual sum of squares and products matrix can be written as follows, denoting it by the matrix *V* :

Multivariate Analysis of Variation

$$V = \sum\_{ij} \left( X\_{ij} - \frac{X\_{i.}}{n\_i} \right) \left( X\_{ij} - \frac{X\_{i.}}{n\_i} \right)' = \sum\_{i=1}^{k} \left[ \sum\_{j=1}^{n\_i} \left( X\_{ij} - \frac{X\_{i.}}{n\_i} \right) \left( X\_{ij} - \frac{X\_{i.}}{n\_i} \right)' \right]$$

$$= \sum\_{i=1}^{k} S\_{\mathbf{i}} = S\_{\mathbf{1}} + S\_{\mathbf{2}} + \dots + S\_{\mathbf{k}} \tag{13.2.6}$$

where *Si* ∼ *Wp(ni* − 1*, Σ), i* = 1*,... , k,* and the *Si*'s are independently distributed since the sample values from the *k* groups are independently distributed among themselves (within the group) and between groups. Hence, *S* ∼ *Wp(ν, Σ), ν* = *(n*<sup>1</sup> −1*)*+*(n*<sup>2</sup> −1*)*+ ···+ *(nk* <sup>−</sup> <sup>1</sup>*)* <sup>=</sup> *n.* <sup>−</sup> *<sup>k</sup>*. Note that *<sup>X</sup>*¯*<sup>i</sup>* <sup>=</sup> *Xi. ni* has Cov*(X*¯*i)* <sup>=</sup> <sup>1</sup> *ni <sup>Σ</sup>*, so that <sup>√</sup>*ni(X*¯*<sup>i</sup>* <sup>−</sup> *X)*¯ are iid *Np(O, Σ)* where *X*¯ = *X../n.* . Then, the sum of squares and products matrix due to the treatments or due to the *Ai*'s is the following, denoting it by *U*:

$$U = \sum\_{i=1}^{k} n\_i \left[ \frac{X\_{i\cdot}}{n\_i} - \frac{X\_{\cdot\cdot}}{n\_\cdot} \right] \left[ \frac{X\_{i\cdot}}{n\_i} - \frac{X\_{\cdot\cdot}}{n\_\cdot} \right]' \sim W\_p(k-1, \Sigma) \tag{13.2.7}$$

under the null hypothesis; when the hypothesis is violated, it is a noncentral Wishart distribution. Further, the sum of squares and products matrix due to the treatments and the residual sum of squares and products matrix are independently distributed. Thus, by comparing *U* and *V* , we should be able to reach a decision regarding the hypothesis. One procedure that is followed is to take the determinants of *U* and *V* and compare them. This does not have much of a basis and determinants should not be called "generalized variance" as previously explained since the basic condition of a norm is violated by the determinant. The basis for comparing determinants will become clear from the point of view of testing hypotheses by applying the likelihood ratio criterion, which is discussed next.

#### **13.3. The Likelihood Ratio Criterion**

Let *Eij iid* ∼ *Np(O, Σ), Σ > O*, and suppose that we have simple random samples of sizes *n*1*,...,nk* from the *k* groups relating to the *k* treatments. Then, the likelihood function, denoted by *L*, is the following:

$$\begin{split} L &= \prod\_{ij} \frac{\mathbf{e}^{-\frac{1}{2}(X\_{ij} - M - A\_i)' \Sigma^{-1} (X\_{ij} - M - A\_i)}}{(2\pi)^{\frac{p}{2}} |\Sigma|^{\frac{1}{2}}} \\ &= \frac{\mathbf{e}^{-\frac{1}{2}\sum\_{ij} (X\_{ij} - M - A\_i)' \Sigma^{-1} (X\_{ij} - M - A\_i)}}{(2\pi)^{\frac{p\kappa}{2}} |\Sigma|^{\frac{n}{2}}}. \end{split} \tag{13.3.1}$$

The maximum likelihood estimators/estimates (MLE's) of *<sup>M</sup>* is *<sup>M</sup>*<sup>ˆ</sup> <sup>=</sup> *X.. n.* = *X*¯ and that of *Ai* is *<sup>A</sup>*ˆ*<sup>i</sup>* <sup>=</sup> *Xi. ni* − *M*ˆ . With a view to obtaining the MLE of *Σ*, we first note that the exponent is a real scalar quantity which is thus equal to its trace, so that we can express the exponent as follows, after substituting the MLE's of *M* and *Ai*:

$$\begin{split}-\frac{1}{2}\sum\_{ij}[X\_{ij}-\hat{M}-\hat{A}\_{i}]'\Sigma^{-1}[X\_{ij}-\hat{M}-\hat{A}\_{i}]\\&=-\frac{1}{2}\sum\_{ij}\text{tr}\Big(\left[X\_{ij}-\frac{X\_{i}}{n\_{i}}\right]'\Sigma^{-1}\Big[X\_{ij}-\frac{X\_{i}}{n\_{i}}\Big]\Big)\\&=-\frac{1}{2}\sum\_{ij}\text{tr}\Big(\Sigma^{-1}\Big[X\_{ij}-\frac{X\_{i}}{n\_{i}}\Big]\Big[X\_{ij}-\frac{X\_{i}}{n\_{i}}\Big]'\Big).\end{split}$$

Now, following through the estimation procedure of the MLE included in Chap. 3, we obtain the MLE of *Σ* as

$$\hat{\Sigma} = \frac{1}{n\_\cdot} \sum\_{ij} \left( X\_{ij} - \frac{X\_{i\cdot}}{n\_i} \right) \left( X\_{ij} - \frac{X\_{i\cdot}}{n\_i} \right)'. \tag{13.3.2}$$

After substituting *M,* ˆ *A*ˆ*<sup>i</sup>* and *Σ*ˆ , the exponent in the likelihood ratio criterion *λ* becomes −1 <sup>2</sup>*n.* tr*(Ip)* = −<sup>1</sup> <sup>2</sup>*n.p*. Hence, the maximum value of the likelihood function *L* under the general model becomes

$$\max L = \frac{\mathbf{e}^{-\frac{1}{2}(n, p)} n^{\frac{n, p}{2}}}{(2\pi)^{\frac{n, p}{2}} |\sum\_{ij} (X\_{ij} - \frac{X\_i}{n\_i})(X\_{ij} - \frac{X\_i}{n\_i})'|}. \tag{13.3.3}$$

Under the hypothesis *Ho* : *A*<sup>1</sup> = *A*<sup>2</sup> =··· = *Ap* = *O,* the model is *Xij* = *M* + *Eij* and the MLE of *M* under *Ho* is still <sup>1</sup> *n. X..* and *Σ*ˆ under *Ho* is <sup>1</sup> *n. ij (Xij* <sup>−</sup> <sup>1</sup> *n. X..)(Xij* <sup>−</sup> <sup>1</sup> *n. X..)*- , so that max*L* under *Ho*, denoted by max*Lo*, is

$$\max L\_o = \frac{\mathbf{e}^{-\frac{1}{2}n\_.p} n\_.^{\frac{n\_.p}{2}}}{(2\pi)^{\frac{n\_.p}{2}} |\sum\_{ij} (X\_{ij} - \frac{1}{n\_.} X\_{..})(X\_{ij} - \frac{1}{n\_.} X\_{..})'|^{\frac{n}{2}}}.\tag{13.3.4}$$

Therefore, the *λ*-criterion is the following:

$$\lambda = \frac{\max L\_o}{\max L} = \frac{|\sum\_{ij} (X\_{ij} - \frac{X\_i}{n\_i})(X\_{ij} - \frac{X\_i}{n\_i})'|^{\frac{n}{2}}}{|\sum\_{ij} (X\_{ij} - \frac{X\_{\cdots}}{n\_{\cdot}})(X\_{ij} - \frac{X\_{\cdots}}{n\_{\cdot}})'|^{\frac{n}{2}}}$$

$$= \frac{|V|^{\frac{n}{2}}}{|U + V|^{\frac{n}{2}}} \tag{13.3.5}$$

where

$$U = \sum\_{ij} \left(\frac{X\_{i.}}{n\_i} - \frac{X\_{..}}{n\_{.}}\right) \left(\frac{X\_{i.}}{n\_i} - \frac{X\_{..}}{n\_{.}}\right)', \; V = \sum\_{ij} \left(X\_{ij} - \frac{X\_{i.}}{n\_i}\right) \left(X\_{ij} - \frac{X\_{i.}}{n\_i}\right)'$$

and *U* ∼ *Wp(k* − 1*,Σ)* under *Ho* is the sum of squares and cross products matrix due to the *Ai*'s and *V* ∼ *Wp(n.* − *k,Σ)* is the residual sum of squares and cross products matrix. It has already been shown that *U* and *V* are independently distributed. Then *<sup>W</sup>*<sup>1</sup> <sup>=</sup> *(U* <sup>+</sup>*V )*−<sup>1</sup> <sup>2</sup>*V (U* <sup>+</sup>*V )*−<sup>1</sup> <sup>2</sup> , with the determinant <sup>|</sup>*<sup>V</sup>* <sup>|</sup> |*U*+*V* | , is a real matrix-variate type-1 beta with parameters *(n.*−*<sup>k</sup>* <sup>2</sup> *, <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *)*, as defined in Chap. 5, and *<sup>W</sup>*<sup>2</sup> <sup>=</sup> *<sup>V</sup>* <sup>−</sup><sup>1</sup> <sup>2</sup>*UV* <sup>−</sup><sup>1</sup> <sup>2</sup> is a real matrix-variate type-2 beta with the parameters *( <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *, n.*−*<sup>k</sup>* <sup>2</sup> *)*. Moreover, *Y*<sup>1</sup> = *I* − *W*<sup>1</sup> = *(U* <sup>+</sup> *V )*−<sup>1</sup> <sup>2</sup>*U (U* <sup>+</sup> *V )*−<sup>1</sup> <sup>2</sup> with <sup>|</sup>*U*<sup>|</sup> <sup>|</sup>*U*+*<sup>V</sup>* <sup>|</sup> is a real matrix-variate type-1 beta random variables with parameters *( <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *, n.*−*<sup>k</sup>* <sup>2</sup> *)*. Given the properties of independent real matrix-variate gamma random variables, we have seen in Chap. 5 that *W*<sup>1</sup> and *Y*<sup>2</sup> = *U* + *V* are independently distributed. Similarly, *Y*<sup>1</sup> = *I* − *W*<sup>1</sup> and *Y*<sup>2</sup> are independently distributed. Further, *W*−<sup>1</sup> <sup>2</sup> = *V* 1 <sup>2</sup>*U*−1*V* 1 <sup>2</sup> is a real matrix-variate type-2 beta random variable with the parameters *(n.*−*<sup>k</sup>* <sup>2</sup> *, <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *)*. Observe that

$$|W\_1| = \frac{|V|}{|U+V|} = \frac{1}{|V^{-\frac{1}{2}}UV^{-\frac{1}{2}} + I|} = \frac{1}{|W\_2 + I|},\ W\_1 = (I + W\_2)^{-1}.$$

A one-to-one function of *λ* is

$$kw = \lambda^{\frac{2}{n\_{\cdot}}} = \frac{|V|}{|U+V|} = |W\_1|. \tag{13.3.6}$$

#### **13.3.1. Arbitrary moments of the likelihood ratio criterion**

For an arbitrary *h*, the *h*-th moment of *w* as well as that of *λ* can be obtained from the normalizing constant of a real matrix-variate type-1 beta density with the parameters *(n.*−*<sup>k</sup>* <sup>2</sup> *, <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *)*. That is,

$$\begin{split} E[w^{h}] &= \frac{\Gamma\_{p}(\frac{n-k}{2}+h)}{\Gamma\_{p}(\frac{n-k}{2})} \frac{\Gamma\_{p}(\frac{n-1}{2})}{\Gamma\_{p}(\frac{n-1}{2}+h)}, \ (n\_{\cdot}-k)+(k-1) = n\_{\cdot}-1, \\ &= \left\{ \prod\_{j=1}^{p} \frac{\Gamma(\frac{n-1}{2}-\frac{j-1}{2})}{\Gamma(\frac{n-k}{2}-\frac{j-1}{2})} \right\} \left\{ \prod\_{j=1}^{p} \frac{\Gamma(\frac{n-k}{2}-\frac{j-1}{2}+h)}{\Gamma(\frac{n-1}{2}-\frac{j-1}{2}+h)} \right\}. \end{split} \tag{13.3.7}$$

As *<sup>E</sup>*[*λh*] = *<sup>E</sup>*[*wn.* 2 ] *<sup>h</sup>* <sup>=</sup> *<sup>E</sup>*[*w( n.* <sup>2</sup> *)h*], the *<sup>h</sup>*-th moment of *<sup>λ</sup>* is obtained by replacing *<sup>h</sup>* by *(n.* <sup>2</sup> *)h* in (13.3.7). That is,

$$E[\lambda^h] = C\_{p,k} \left\{ \prod\_{j=1}^p \frac{\Gamma(\frac{n-k}{2} - \frac{j-1}{2} + (\frac{n}{2})h)}{\Gamma(\frac{n-1}{2} - \frac{j-1}{2} + (\frac{n}{2})h)} \right\} \tag{13.3.8}$$

where

$$C\_{p,k} = \left\{ \prod\_{j=1}^p \frac{\Gamma(\frac{n-1}{2} - \frac{j-1}{2})}{\Gamma(\frac{n-k}{2} - \frac{j-1}{2})} \right\}.$$

#### **13.3.2. Structural representation of the likelihood ratio criterion**

It can readily be seen from (13.3.7) that the *h*-th moment of *w* is of the form of the *h*-th moment of a product of independently distributed real scalar type-1 beta random variables. That is,

$$E[w^h] = E[w\_1 w\_2 \cdots w\_p]^h, \ w = w\_1 w\_2 \cdots w\_p,\tag{13.3.9}$$

where *w*1*,...,wp* are independently distributed and *wj* is a real scalar type-1 beta random variable with the parameters *(n.*−*<sup>k</sup>* <sup>2</sup> <sup>−</sup> *<sup>j</sup>*−<sup>1</sup> <sup>2</sup> *, <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *), j* = 1*,... , p,* for *n.* − *k>p* − 1 and *n. > k* + *p* − 1. Hence the exact density of *w* is available by constructing the density of a product of independently distributed real scalar type-1 beta random variables. For special values of *p* and *k*, one can obtain the exact densities in the forms of elementary functions. However, for the general case, the exact density corresponding to *<sup>E</sup>*[*wh*] as specified in (13.3.7) can be expressed in terms of a G-function and, in the case of *<sup>E</sup>*[*λh*] as given in (13.3.8), the exact density can be represented in terms of an H-function. These representations are as follows, denoting the densities of *w* and *λ* as *fw(w)* and *fλ(λ)*, respectively:

$$f\_w(w) = C\_{p,k} \, G\_{p,p}^{p,0} \left[ w \Big|\_{\frac{n-k}{2} - \frac{j-1}{2} - 1, \, j = 1, \dots, p}^{\frac{n-1}{2} - \frac{j-1}{2} - 1, \, j = 1, \dots, p} \right], \ 0 < w \le 1,\tag{13.3.10}$$

$$f\_{\lambda}(\lambda) = C\_{p,k} H\_{p,p}^{p,0} \left[ \lambda \middle| \begin{matrix} (\frac{n-1}{2} - \frac{j-1}{2} - \frac{n}{2}, \frac{n}{2}), & j = 1, \dots, p \\ (\frac{n-k}{2} - \frac{j-1}{2} - \frac{n}{2}, \frac{n}{2}), & j = 1, \dots, p \end{matrix} \right], \ 0 < \lambda \le 1,\tag{13.3.11}$$

for *n. > p* + *k* − 1*, p* ≥ 1 and *fw(w)* = 0*, fλ(λ)* = 0*,* elsewhere. The evaluation of G and H-functions can be carried out with the help of symbolic computing packages such as Mathematica and MAPLE. Theoretical considerations, applications and several special cases of the G and H-functions are, for instance, available from Mathai (1993) and Mathai, Saxena and Haubold (2010). The special cases listed therein can also be utilized to work out the densities for particular cases of (13.3.10) and (13.3.11). Explicit structures of the densities for certain special cases are listed in the next section.

#### **13.3.3. Some special cases**

Several particular cases can be worked out by examining the moment expressions in (13.3.7) and (13.3.8). The *h*-th moment of the *w* = *λ* 2 *n.* , where *λ* is the likelihood ratio criterion, is available from (13.3.7) as

$$E[w^h] = C\_{p,k} \frac{\Gamma(\frac{n-k}{2} + h)\Gamma(\frac{n-k}{2} - \frac{1}{2} + h) \cdots \Gamma(\frac{n-k}{2} - \frac{p-1}{2} + h)}{\Gamma(\frac{n-1}{2} + h)\Gamma(\frac{n-1}{2} - \frac{1}{2} + h) \cdots \Gamma(\frac{n-1}{2} - \frac{p-1}{2} + h)}. \tag{i}$$

**Case (1):** *p* = 1

In this case, from *(i)*,

$$E[w^h] = C\_{1,k} \frac{\Gamma(\frac{n\_.-k}{2} + h)}{\Gamma(\frac{n\_.-1}{2} + h)},$$

which is the *h*-th moment of a real scalar type-1 beta random variable with the parameters *(n.*−*<sup>k</sup>* <sup>2</sup> *, <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *)* and, in this case, *w* is simply a real scalar type-1 beta random variable with the parameters *(n.*−*<sup>k</sup>* <sup>2</sup> *, <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *)*. We reject the null hypothesis *Ho* : *A*<sup>1</sup> = *A*<sup>2</sup> =···= *Ak* = *O* for small values of the *λ*-criterion and, accordingly, we reject *Ho* for small values of *w* or the hypothesis is rejected when the observed value of *w* ≤ *wα* where *wα* is such that # *wα* <sup>0</sup> *fw(w)*d*w* = *α* for the preassigned size *α* of the critical region, *fw(w)* denoting the density of *w* for *p* = 1*, n. > k*.

**Case (2):** *p* = 2

From *(i)*, we have

$$E[w^h] = C\_{2,k} \frac{\Gamma(\frac{n\_\cdot - k}{2} + h)\Gamma(\frac{n\_\cdot - k}{2} - \frac{1}{2} + h)}{\Gamma(\frac{n\_\cdot - 1}{2} + h)\Gamma(\frac{n\_\cdot - 1}{2} - \frac{1}{2} + h)}$$

and therefore

$$E[w^{\frac{1}{2}}]^h = C\_{2,k} \frac{\Gamma(\frac{n\_\cdot - k}{2} + \frac{h}{2})\Gamma(\frac{n\_\cdot - k}{2} - \frac{1}{2} + \frac{h}{2})}{\Gamma(\frac{n\_\cdot - 1}{2} + \frac{h}{2})\Gamma(\frac{n\_\cdot - 1}{2} - \frac{1}{2} + \frac{h}{2})}. \tag{ii}$$

The gamma functions in *(ii)* can be combined by making use of a duplication formula for gamma functions, namely,

$$
\Gamma(z)\Gamma(z+1/2) = \pi^{\frac{1}{2}}2^{1-2z}\Gamma(2z). \tag{13.3.12}
$$

Take *<sup>z</sup>* <sup>=</sup> *n.*−*<sup>k</sup>* <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> <sup>+</sup> *<sup>h</sup>* <sup>2</sup> and *<sup>z</sup>* <sup>=</sup> *n.*−<sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> <sup>+</sup> *<sup>h</sup>* <sup>2</sup> in the part containing *h* and in the constant part wherein *h* = 0, and then apply formula (13.3.12) to obtain

$$E[w^{\frac{1}{2}}]^h = \frac{\Gamma(n\_\cdot - 2)}{\Gamma(n\_\cdot - k - 1)} \frac{\Gamma(n\_\cdot - k - 1 + h)}{\Gamma(n\_\cdot - 2 + h)},$$

which is, for an arbitrary *h*, the *h*-th moment of a real scalar type-1 beta random variable with parameters *(n.* − *k* − 1*, k* − 1*)* for *n.* − *k* − 1 *>* 0*,k>* 1. Thus, *y* = *w* 1 <sup>2</sup> is a real scalar type-1 beta random variable with the parameters *(n.* −*k* −1*, k* −1*)*. We would then reject *Ho* for small values of *w*, that is, for small values of *y* or when the observed value of *<sup>y</sup>* <sup>≤</sup> *yα* with *yα* such that # *yα* <sup>0</sup> *fy(y)*d*y* = *α* for a preassigned probability of type-I error which is the error of rejecting *Ho* when *Ho* is true, where *fy(y)* is the density of *y* for *p* = 2 whenever *n. > k* + 1.

**Case (3):** *k* = 2*, p* ≥ 1*, n. > p* + 1

In this case, the *h*-th moment of *w* as specified in (13.3.7) is the following:

$$\begin{aligned} E[w^h] &= C\_{p,2} \frac{\Gamma(\frac{n-2}{2} + h) \Gamma(\frac{n-2}{2} - \frac{1}{2} + h) \cdots \Gamma(\frac{n-2}{2} - \frac{p-1}{2} + h)}{\Gamma(\frac{n-1}{2} + h) \Gamma(\frac{n-1}{2} - \frac{1}{2} + h) \cdots \Gamma(\frac{n-1}{2} - \frac{p-1}{2} + h)} \\ &= C\_{p,2} \frac{\Gamma(\frac{n-1}{2} - \frac{p}{2} + h)}{\Gamma(\frac{n-1}{2} + h)} \end{aligned}$$

since the numerator gamma functions, except the last one, cancel with the denominator gamma functions except the first one. This expression happens to be the *h*-th moment of a real scalar type-1 beta random variable with the parameters *(n.*−1−*<sup>p</sup>* <sup>2</sup> *, <sup>p</sup>* <sup>2</sup> *)* and hence, for *k* = 2*, n.* −1−*p >* 0 and *p* ≥ 1*, w* is a real scalar type-1 beta random variable. Then, we reject the null hypothesis *Ho* for small values of *w* or when the observed value of *w* ≤ *wα,* with *wα* such that # *wα* <sup>0</sup> *fw(w)*d*w* = *α* for a preassigned significance level *α*, *fw(w)* being the density of *w* for this case. We will use the same notation *fw(w)* for the density of *w* in all the special cases.

# **Case (4):** *k* = 3*, p* ≥ 1

Proceeding as in Case (3), we see that all the gammas in the *h*-th moment of *w* cancel out except the last two in the numerator and the first two in the denominator. Thus,

$$\begin{split} E[w^h] &= C\_{p,3} \frac{\Gamma(\frac{n-3}{2} + \frac{1}{2} - \frac{p-1}{2} + h) \Gamma(\frac{n-3}{2} - \frac{p-1}{2})}{\Gamma(\frac{n-1}{2} + h) \Gamma(\frac{n-1}{2} - \frac{1}{2} + h)} \\ &= C\_{p,3} \frac{\Gamma(\frac{n-1}{2} - \frac{p}{2} + h) \Gamma(\frac{n-1}{2} - \frac{p}{2} - \frac{1}{2} + h)}{\Gamma(\frac{n-1}{2} + h) \Gamma(\frac{n-1}{2} - \frac{1}{2} + h)}. \end{split}$$

After combining the gammas in *y* = *w* 1 <sup>2</sup> with the help of the duplication formula (13.3.12), we have the following:

$$E\{\mathbf{y}^h\} = \frac{\Gamma(n\_\cdot - 2)}{\Gamma(n\_\cdot - 2 - p)} \frac{\Gamma(n\_\cdot - p - 2 + h)}{\Gamma(n\_\cdot - 2 + h)} \cdot \mathbf{1}$$

Therefore, *y* = *w* 1 <sup>2</sup> is a real scalar type-1 random variable with the parameters *(n.* − *p* − 2*, p)*. We reject the null hypothesis for small values of *y* or when the observed value of *<sup>y</sup>* <sup>≤</sup> *yα,* with *yα* such that # *yα* <sup>0</sup> *fy(y)*d*y* = *α* for a preassigned significance level *α*. We will use the same notation *fy(y)* for the density of *y* in all special cases.

We can also obtain some special cases for *<sup>t</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup>−*<sup>w</sup> <sup>w</sup>* and *<sup>t</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup>−*<sup>y</sup> <sup>y</sup> ,* with *<sup>y</sup>* <sup>=</sup> <sup>√</sup>*w*. With this transformation, *t*<sup>1</sup> and *t*<sup>2</sup> will be available in terms of type-2 beta variables in the real scalar case, which conveniently enables us to relate this distribution to real scalar *F* random variables so that an *F* table can be used for testing the null hypothesis and reaching a decision. We have noted that

$$\begin{aligned} w &= \frac{|V|}{|U+V|} = |(U+V)^{-\frac{1}{2}}V(U+V)^{-\frac{1}{2}}| = |W\_1| \\ &= \frac{1}{|V^{-\frac{1}{2}}UV^{-\frac{1}{2}}+I|} = \frac{1}{|W\_2+I|} \end{aligned}$$

where *W*<sup>1</sup> is a real matrix-variate type-1 beta random variable with the parameters *(n.*−*<sup>k</sup>* <sup>2</sup> *, <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *)* and *W*<sup>2</sup> is a real matrix-variate type-2 beta random variable with the parameters *( <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *, n.*−*<sup>k</sup>* <sup>2</sup> *)*. Then, when *p* = 1, *W*<sup>1</sup> and *W*<sup>2</sup> are real scalar variables, denoted by *w*<sup>1</sup> and *w*2, respectively. Then for *p* = 1, we have one gamma ratio with *h* in the general *h*-th moment (13.3.7) and then,

$$t\_1 = \frac{1 - w}{w} = \frac{1}{w} - 1 = (w\_2 + 1) - 1 = w\_2$$

where *w*<sup>2</sup> is a real scalar type-2 beta random variable with the parameters *(p*−<sup>1</sup> <sup>2</sup> *, n.*−*<sup>k</sup>* <sup>2</sup> *)*. As well, in general, for a real matrix-variate type-2 beta matrix *W*<sup>2</sup> with the parameters *( ν*1 <sup>2</sup> *, <sup>ν</sup>*<sup>2</sup> <sup>2</sup> *),* we have *<sup>ν</sup>*<sup>2</sup> *ν*1 *W*<sup>2</sup> = *Fν*1*,ν*<sup>2</sup> where *Fν*1*,ν*<sup>2</sup> is a real matrix-variate *F* matrix random variable with degrees of freedom *<sup>ν</sup>*<sup>1</sup> and *<sup>ν</sup>*2. When *<sup>p</sup>* <sup>=</sup> 1 or in the real scalar case *<sup>ν</sup>*<sup>2</sup> *ν*1 *w*<sup>2</sup> = *Fν*1*,ν*<sup>2</sup> where, in this case, *F* is a real scalar *F* random variable with *ν*<sup>1</sup> and *ν*<sup>2</sup> degrees of freedom. We have used *F* for the scalar and matrix-variate case in order to avoid too many symbols. For *p* = 2, we combine the gamma functions in the numerator and denominator by applying the duplication formula for gamma functions (13.3.12); then, for *<sup>t</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup>−*<sup>y</sup> y* the situation turns out to be the same as in the case of *t*1, the only difference being that in the real scalar type-2 beta *w*2, the parameters are *(k* − 1*, n.* − *k* − 1*)*. Note that the original *k*−1 <sup>2</sup> has become *<sup>k</sup>* <sup>−</sup> 1 and the original *n.*−*<sup>k</sup>* <sup>2</sup> has become *n.* − *k* − 1. Thus, we can state the following two special cases.

**Case (5):** *<sup>p</sup>* <sup>=</sup> <sup>1</sup>*, t*<sup>1</sup> <sup>=</sup> <sup>1</sup>−*<sup>w</sup> w*

As was explained, *t*<sup>1</sup> is a real type-2 beta random variable with the parameters *( <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *, n.*−*<sup>k</sup>* <sup>2</sup> *)*, so that

$$\frac{n\_\cdot - k}{k - 1} t\_1 \simeq F\_{k - 1, n\_\cdot - k},$$

which is a real scalar *F* random variable with *k* − 1 and *n.* − *k* degrees of freedom. Accordingly, we reject *Ho* for small values of *w* and *y*, which corresponds to large values of *F*. Thus, we reject the null hypothesis *Ho* whenever the observed value of *Fk*−1*,n.*−*<sup>k</sup>* ≥ *Fk*−1*,n.*−*k,α* where *Fk*−1*,n.*−*k,α* is the upper 100 *α*% percentage point of the *F* distribution or # <sup>∞</sup> *<sup>a</sup> g(F )*d*F* = *α* where *a* = *Fk*−1*,n.*−*k,α* and *g(F )* is the density of *F* in this case.

**Case (6):** *<sup>p</sup>* <sup>=</sup> <sup>2</sup>*, t*<sup>2</sup> <sup>=</sup> <sup>1</sup>−*<sup>y</sup> <sup>y</sup> , y* <sup>=</sup> <sup>√</sup>*<sup>w</sup>*

As previously explained, *t*<sup>2</sup> is a real scalar type-2 beta random variable with the parameters *(k* − 1*, n.* − *k* − 1*)* or

$$\frac{n\_\cdot - k - 1}{k - 1} t\_2 \simeq F\_{2(k - 1), 2(n\_\cdot - k - 1)},$$

which is a real scalar *F* random variable having 2*(k* − 1*)* and 2*(n.* − *k* − 1*)* degrees of freedom. We reject the null hypothesis for large values of *t*<sup>2</sup> or when the observed value of [ *n.*−*k*−1 *<sup>k</sup>*−<sup>1</sup> ]*t*<sup>2</sup> <sup>≥</sup> *<sup>b</sup>* with *<sup>b</sup>* such that # <sup>∞</sup> *<sup>b</sup> g(F )*d*F* = *α*, *g(F )* denoting in this case the density of a real scalar random variable *F* with degrees of freedoms 2*(k* − 1*)* and 2*(n.* − *k* − 1*)*, and *b* = *F*2*(k*−1*),*2*(n.*−*k*−1*),α*.

**Case (7):** *<sup>k</sup>* <sup>=</sup> <sup>2</sup>*, p* <sup>≥</sup> <sup>1</sup>*, t*<sup>1</sup> <sup>=</sup> <sup>1</sup>−*<sup>w</sup> w*

For the case *k* = 2, we have already seen that the gamma functions with *h* in their arguments cancel out, leaving only one gamma in the numerator and one gamma in the denominator, so that *w* is distributed as a real scalar type-1 beta random variable with the parameters *(n.*−1−*<sup>p</sup>* <sup>2</sup> *, <sup>p</sup>* <sup>2</sup> *)*. Thus, *<sup>t</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup>−*<sup>w</sup> <sup>w</sup>* is a real scalar type-2 beta with the parameters *(p* <sup>2</sup> *, n.*−*p*−<sup>1</sup> <sup>2</sup> *)*, and

$$
\left[\frac{n\_\cdot - 1 - p}{p}\right] t\_1 \simeq F\_{p, n\_\cdot - 1 - p},
$$

which is a real scalar *F* random variable having *p* and *n.* − 1 − *p* degrees of freedom. We reject *Ho* for large values of *t*<sup>1</sup> or when the observed value of [ *n.*−1−*p <sup>p</sup>* ]*t*<sup>1</sup> ≥ *b* where *b*

is such that # <sup>∞</sup> *<sup>b</sup> g(F )*d*F* = *α* with *g(F )* being the density of an *F* random variable with degrees of freedoms *p* and *n.* − 1 − *p* in this special case.

**Case (8):** *<sup>k</sup>* <sup>=</sup> <sup>3</sup>*, p* <sup>≥</sup> <sup>1</sup>*, t*<sup>2</sup> <sup>=</sup> <sup>1</sup>−*<sup>y</sup> <sup>y</sup> , y* <sup>=</sup> <sup>√</sup>*<sup>w</sup>*

On combining Cases (4) and (6), it is seen that *t*<sup>2</sup> is a real scalar type-2 beta random variable with the parameters *(p, n.* − *p* − 1*)*, so that

$$\frac{n\_\cdot - 1 - p}{p} \, t\_2 \simeq F\_{2p, 2(n\_\cdot - p - 1)},$$

which is a real scalar *F* random variable with the degrees of freedoms *(*2*p,* 2*(n.* −*p* −1*))*. Thus, we reject the hypothesis for large values of this *F* random variable. For a test at significance level *α* or with *α* as the size of its critical region, the hypothesis *Ho* : *A*<sup>1</sup> = *A*<sup>2</sup> = ··· = *Ak* = *O* is rejected when the observed value of this *F* ≥ *F*2*p,*2*(n.*−1−*p),α* where *F*2*p,*2*(n.*−1−*p),α* is the upper 100 *α*% percentage point of the *F* distribution.

**Example 13.3.1.** In a dieting experiment, three different diets *D*1*, D*<sup>2</sup> and *D*<sup>3</sup> are tried for a period of one month. The variables monitored are weight in kilograms (kg), waist circumference in centimeters (cm) and right mid-thigh circumference in centimeters. The measurements are *x*<sup>1</sup> = final weight minus initial weight, *x*<sup>2</sup> = final waist circumference minus initial waist reading and *x*<sup>3</sup> = final minus initial thigh circumference. Diet *D*<sup>1</sup> is administered to a group of 5 randomly selected individuals (*n*<sup>1</sup> = 5), *D*2, to 4 randomly selected persons (*n*<sup>2</sup> = 4), and 6 randomly selected individuals (*n*<sup>3</sup> = 6) are subjected to *D*3. Since three variables are monitored, *p* = 3. As well, there are three treatments or three diets, so that *k* = 3. In our notation,

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ x\_3 \end{bmatrix}, \ X\_{ij} = \begin{bmatrix} x\_{1ij} \\ x\_{2ij} \\ x\_{3ij} \end{bmatrix}, \ X\_{1j} = \begin{bmatrix} x\_{11j} \\ x\_{21j} \\ x\_{31j} \end{bmatrix}, \ X\_{2j} = \begin{bmatrix} x\_{12j} \\ x\_{22j} \\ x\_{32j} \end{bmatrix}, \ X\_{3j} = \begin{bmatrix} x\_{13j} \\ x\_{23j} \\ x\_{33j} \end{bmatrix},$$

where *i* corresponds to the diet number and *j* stands for the sample serial number. For example, the observation vector on individual #3 within the group subjected to diet *D*<sup>2</sup> is denoted by *X*23. The following are the data on *x*1*, x*2*, x*3:

**Diet** *D*<sup>1</sup> : *X*1*<sup>j</sup> , j* = 1*,* 2*,* 3*,* 4*,* 5 :

$$X\_{11} = \begin{bmatrix} 2 \\ 3 \\ 1 \end{bmatrix}, \ X\_{12} = \begin{bmatrix} 4 \\ -2 \\ -1 \end{bmatrix}, \ X\_{13} = \begin{bmatrix} -1 \\ -2 \\ 1 \end{bmatrix}, \ X\_{14} = \begin{bmatrix} -1 \\ 1 \\ -1 \end{bmatrix}, \ X\_{15} = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}.$$

**Diet** *D*<sup>2</sup> : *X*2*<sup>j</sup> , j* = 1*,* 2*,* 3*,* 4 :

$$X\_{21} = \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix}, \quad X\_{22} = \begin{bmatrix} 3 \\ -1 \\ -2 \end{bmatrix}, \; X\_{23} = \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix}, \; X\_{24} = \begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix}.$$

**Diet** *D*<sup>3</sup> : *X*3*<sup>j</sup> , j* = 1*,* 2*,* 3*,* 4*,* 5*,* 6 :

$$\begin{aligned} X\_{31} &= \begin{bmatrix} 2 \\ 2 \\ -1 \end{bmatrix}, \ X\_{32} = \begin{bmatrix} 1 \\ 3 \\ 1 \end{bmatrix}, \ X\_{33} = \begin{bmatrix} -1 \\ 2 \\ 2 \end{bmatrix}, \\\ X\_{34} &= \begin{bmatrix} 2 \\ 4 \\ 2 \end{bmatrix}, \ X\_{35} = \begin{bmatrix} 2 \\ 0 \\ 0 \end{bmatrix}, \ X\_{36} = \begin{bmatrix} 0 \\ 1 \\ 2 \end{bmatrix}. \end{aligned}$$

(1): Perform an ANOVA test on the first component consisting of weight measurements; (2): Carry out a MANOVA test on the first two components, weight and waist measurements; (3): Do a MANOVA test on all the three variables, weight, waist and thigh measurements.

**Solution 13.3.1.** We first compute the vectors *X*1*., X*¯ <sup>1</sup>*, X*2*., X*¯ <sup>2</sup>*, X*3*., X*¯ <sup>3</sup>*, X..* and *X*¯ :

$$\begin{aligned} \bar{X}\_{1.} &= \begin{bmatrix} 5 \\ 0 \\ 0 \end{bmatrix}, \bar{X}\_{1} = \frac{X\_{1.}}{n\_{1}} = \frac{1}{5} \begin{bmatrix} 5 \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \bar{X}\_{2.} = \begin{bmatrix} 4 \\ 4 \\ 0 \end{bmatrix}, \bar{X}\_{2} = \frac{X\_{2.}}{n\_{2}} = \frac{1}{4} \begin{bmatrix} 4 \\ 4 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}, \\\ X\_{3.} &= \begin{bmatrix} 6 \\ 12 \\ 6 \end{bmatrix}, \ \bar{X}\_{3} = \frac{X\_{3.}}{6} = \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix}, \ X\_{..} = \begin{bmatrix} 15 \\ 16 \\ 6 \end{bmatrix}, \ \bar{X} = \frac{X\_{..}}{n\_{.}} = \frac{1}{15} \begin{bmatrix} 15 \\ 16 \\ 6 \end{bmatrix} = \begin{bmatrix} 1 \\ 16/15 \\ 6/15 \end{bmatrix}. \end{aligned}$$

**Problem (1): ANOVA on the first component** *x*1**.** The first components of the observations are *x*1*ij* . The first components under diet *D*<sup>1</sup> are

$$[\mathbf{x}\_{111}, \mathbf{x}\_{112}, \mathbf{x}\_{113}, \mathbf{x}\_{114}, \mathbf{x}\_{115}] = [\mathbf{2}, \mathbf{4}, -1, -1, 1] \text{ with } \mathbf{x}\_{11...} = \mathbf{5};$$

the first components of observations under diet *D*<sup>2</sup> are

$$[\mathbf{x}\_{121}, \mathbf{x}\_{122}, \mathbf{x}\_{123}, \mathbf{x}\_{124}] = [1, 3, -1, 1] \text{ with } \mathbf{x}\_{12.} = 4;$$

and the first components under diet *D*<sup>3</sup> are

$$[\mathbf{x}\_{131}, \mathbf{x}\_{132}, \mathbf{x}\_{133}, \mathbf{x}\_{134}, \mathbf{x}\_{135}, \mathbf{x}\_{136}] = [\mathbf{2}, 1, -1, \mathbf{2}, \mathbf{2}, \mathbf{0}] \text{ with } \mathbf{x}\_{13.} = 6.$$

Hence, the total on the first component *<sup>x</sup>*1*..* <sup>=</sup> <sup>15</sup>*,* and *<sup>x</sup>*¯<sup>1</sup> <sup>=</sup> *<sup>x</sup>*1*.. n.* <sup>=</sup> <sup>15</sup> <sup>15</sup> = 1. The first component model is the following:

$$x\_{1ij} = \mu + \alpha\_i + e\_{1ij}, \; j = 1, \ldots, n\_i, \; i = 1, \ldots, k.$$

Note again that estimators and estimates will be denoted by a hat. As previously mentioned, the same symbols will be used for the variables and the observations on those variables in order to avoid using too many symbols; however, the notations will be clear from the context. If the discussion pertains to distributions, then variables are involved, and if we are referring to numbers, then we are dealing with observations.

The least squares estimates are *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>x</sup>*1*.. n.* <sup>=</sup> <sup>1</sup>*, <sup>α</sup>*<sup>ˆ</sup> <sup>1</sup> <sup>=</sup> *<sup>x</sup>*11*.* <sup>5</sup> <sup>=</sup> <sup>5</sup> <sup>5</sup> <sup>=</sup> 1, *<sup>α</sup>*<sup>ˆ</sup> <sup>2</sup> <sup>=</sup> *<sup>x</sup>*12*.* <sup>4</sup> <sup>=</sup> <sup>4</sup> <sup>4</sup> = 1, *<sup>α</sup>*<sup>ˆ</sup> <sup>3</sup> <sup>=</sup> *<sup>x</sup>*13*.* <sup>6</sup> <sup>=</sup> <sup>6</sup> <sup>6</sup> = 1. The first component hypothesis is *α*<sup>1</sup> = *α*<sup>2</sup> = *α*<sup>3</sup> = 0. The total sum of squares is

$$\begin{split} \sum\_{ij} (\mathbf{x}\_{1ij} - \bar{\mathbf{x}}\_1)^2 &= \sum\_{ij} \mathbf{x}\_{1ij}^2 - \frac{\mathbf{x}\_{1\dots}^2}{n} \\ &= (2-1)^2 + (4-1)^2 + (-1-1)^2 + (-1-1)^2 + (1-1)^2 \\ &+ (1-1)^2 + (3-1)^2 + (-1-1)^2 + (1-1)^2 \\ &+ (2-1)^2 + (1-1)^2 + (-1-1)^2 + (2-1)^2 + (2-1)^2 + (0-1)^2 \\ &= 34. \end{split}$$

The sum of squares due to the *αi*'s is available from

$$\begin{split} \sum\_{i} n\_{i} \left( \frac{\chi\_{1i.}}{n\_{i}} - \frac{\chi\_{1..}}{n\_{.}} \right)^{2} &= \sum\_{i} \frac{\chi\_{1i.}^{2}}{n\_{i}} - \frac{\chi\_{1..}^{2}}{n\_{.}} \\ &= 5 \left( \frac{5}{5} - \frac{15}{15} \right)^{2} + 4 \left( \frac{4}{4} - \frac{15}{15} \right)^{2} + 6 \left( \frac{6}{6} - \frac{15}{15} \right)^{2} = 0.7 \end{split}$$

Hence the following table:

#### ANOVA Table


Since the sum of squares due to the *αi*'s is null, the hypothesis is not rejected at any level.

**Problem (2): MANOVA on the first two components.** We are still using the notation *Xij* for the two and three-component cases since our general notation does not depend on the number of components in the vector concerned; as well, we can make use of the computations pertaining to the first component in Problem (1). The relevant quantities computed from the data on the first two components are the following:

$$\begin{aligned} \text{Dit } D\_{\text{l}} & \quad \begin{bmatrix} 2 & 4 & -1 & -1 & 1 \\ 3 & -2 & -2 & 1 & 0 \end{bmatrix}, \ X\_{1..} = \begin{bmatrix} x\_{11.} \\ x\_{21.} \end{bmatrix} = \begin{bmatrix} 5 \\ 0 \end{bmatrix}, \ \bar{X}\_1 = \frac{1}{5} \begin{bmatrix} 5 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}; \\ \text{Dit } D\_{\text{2}} & \quad \begin{bmatrix} 1 & 3 & -1 & 1 \\ 2 & -1 & 2 & 1 \end{bmatrix}, \ X\_{2..} = \begin{bmatrix} x\_{12.} \\ x\_{22.} \end{bmatrix} = \begin{bmatrix} 4 \\ 4 \end{bmatrix}, \ \bar{X}\_2 = \frac{1}{4} \begin{bmatrix} 4 \\ 4 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}; \\ \text{Dit } D\_{\text{3}} & \quad \begin{bmatrix} 2 & 1 & -1 & 2 & 2 & 0 \\ 2 & 3 & 2 & 4 & 0 & 1 \end{bmatrix}, \ X\_{3.} = \begin{bmatrix} 6 \\ 12 \end{bmatrix}, \ \bar{X}\_3 = \begin{bmatrix} 1 \\ 2 \end{bmatrix}. \end{aligned}$$

In this case, the grand total, denoted by *X..*, and the grand average, denoted by *X*¯ , are the following:

$$X\_{\dots} = \begin{bmatrix} 15 \\ 16 \end{bmatrix}, \ \bar{X} = \frac{1}{15} \begin{bmatrix} 15 \\ 16 \end{bmatrix} = \begin{bmatrix} 1 \\ 16/15 \end{bmatrix}.$$

Note that the total sum of squares and cross products matrix can be written as follows:

$$\begin{aligned} \sum\_{ij} (\mathbf{X}\_{ij} - \bar{\mathbf{X}})(\mathbf{X}\_{ij} - \bar{\mathbf{X}})' &= \sum\_{j=1}^{5} (\mathbf{X}\_{1j} - \bar{\mathbf{X}})(\mathbf{X}\_{1j} - \bar{\mathbf{X}})' + \sum\_{j=1}^{4} (\mathbf{X}\_{2j} - \bar{\mathbf{X}})(\mathbf{X}\_{2j} - \bar{\mathbf{X}})' \\ &+ \sum\_{j=1}^{6} (\mathbf{X}\_{3j} - \bar{\mathbf{X}})(\mathbf{X}\_{3j} - \bar{\mathbf{X}})'. \end{aligned}$$

Then,

$$\begin{aligned} \sum\_{j=1}^{5} (X\_{1j} - \bar{X})(X\_{1j} - \bar{X})' &= \begin{bmatrix} 1 & 29/15 \\ 29/15 & 29^2/15^2 \end{bmatrix} + \begin{bmatrix} 9 & -138/15 \\ -138/15 & 46^2/15^2 \end{bmatrix} \\ &+ \begin{bmatrix} 4 & 92/15 \\ 92/15 & 46^2/15^2 \end{bmatrix} + \begin{bmatrix} 4 & 2/15 \\ 2/15 & 1/15^2 \end{bmatrix} + \begin{bmatrix} 0 & 0 \\ 0 & 16^2/15^2 \end{bmatrix} \\ &= \begin{bmatrix} 18 & -1 \\ -1 & 5330/15^2 \end{bmatrix}; \end{aligned}$$

$$\begin{aligned} \sum\_{j=1}^{4} (X\_{2j} - \bar{X})(X\_{2j} - \bar{X})' &= \begin{bmatrix} 0 & 0 \\ 0 & 14^2/15^2 \end{bmatrix} + \begin{bmatrix} 4 & -62/15 \\ -62/15 & 31^2/15^2 \end{bmatrix} \\ &+ \begin{bmatrix} 4 & -28/15 \\ -28/15 & 14^2/15^2 \end{bmatrix} + \begin{bmatrix} 0 & 0 \\ 0 & 1/15^2 \end{bmatrix} = \begin{bmatrix} 8 & -6 \\ -6 & 1354/15^2 \end{bmatrix}; \end{aligned}$$

$$\begin{split} \sum\_{j=1}^{6} (X\_{3j} - \bar{X})(X\_{3j} - \bar{X})' &= \begin{bmatrix} 1 & 14/15 \\ 14/15 & 14^2/15^2 \end{bmatrix} + \begin{bmatrix} 0 & 0 \\ 0 & 29^2/15^2 \end{bmatrix} + \begin{bmatrix} 4 & -28/15 \\ -28/15 & 14^2/15^2 \end{bmatrix} \\ &+ \begin{bmatrix} 1 & 44/15 \\ 44/15 & 44^2/15^2 \end{bmatrix} + \begin{bmatrix} 1 & -16/15 \\ -16/15 & 16^2/15^2 \end{bmatrix} \\ &+ \begin{bmatrix} 1 & 1/15 \\ 1/15 & 1/15^2 \end{bmatrix} = \begin{bmatrix} 8 & 1 \\ 1 & 3426/15^2 \end{bmatrix}; \end{split}$$

$$\sum\_{ij} (X\_{ij} - \bar{X})(X\_{ij} - \bar{X})' = \begin{bmatrix} 18 & -1 \\ -1 & 5330/15^2 \end{bmatrix} + \begin{bmatrix} 8 & -6 \\ -6 & 1354/15^2 \end{bmatrix} + \begin{bmatrix} 8 & 1 \\ 1 & 3426/15^2 \end{bmatrix}$$

$$= \begin{bmatrix} 34 & -6 \\ -6 & 674/15 \end{bmatrix} \text{ and } \begin{vmatrix} 34 & -6 \\ -6 & 674/15 \end{vmatrix} = 1491.73.$$

Now, consider the residual sum of squares and cross products matrix:

$$\begin{split} \sum\_{ij} (X\_{ij} - \frac{X\_i}{n\_i})(X\_{ij} - \frac{X\_i}{n\_i})' &= \sum\_{j=1}^{5} (X\_{1j} - \frac{X\_1}{n\_1})(X\_{1j} - \frac{X\_1}{n\_1})' \\ &+ \sum\_{j=1}^{4} (X\_{2j} - \frac{X\_2}{n\_2})(X\_{2j} - \frac{X\_2}{n\_2})' + \sum\_{j=1}^{6} (X\_{3j} - \frac{X\_3}{n\_3})(X\_{3j} - \frac{X\_3}{n\_3})'. \end{split}$$

That is,

$$\begin{aligned} \sum\_{j=1}^{5} (X\_{1j} - \frac{X\_1}{n\_1})(X\_{1j} - \frac{X\_1}{n\_1})' &= \begin{bmatrix} 1 & 3 \\ 3 & 9 \end{bmatrix} + \begin{bmatrix} 9 & -6 \\ -6 & 4 \end{bmatrix} + \begin{bmatrix} 4 & 4 \\ 4 & 4 \end{bmatrix} \\ &+ \begin{bmatrix} 4 & -2 \\ -2 & 1 \end{bmatrix} + \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} 18 & -1 \\ -1 & 18 \end{bmatrix}; \end{aligned}$$

$$\begin{aligned} \sum\_{j=1}^{4} (X\_{2j} - \frac{X\_2}{n\_2})(X\_{2j} - \frac{X\_2}{n\_2})' &= \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix} + \begin{bmatrix} 4 & -4 \\ -4 & 4 \end{bmatrix} + \begin{bmatrix} 4 & -2 \\ -2 & 1 \end{bmatrix} \\ &+ \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} 8 & -6 \\ -6 & 6 \end{bmatrix}; \end{aligned}$$

$$\begin{aligned} \sum\_{j=1}^{6} (X\_{3j} - \frac{X\_3}{n\_3})(X\_{3j} - \frac{X\_3}{n\_3})' &= \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} + \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix} + \begin{bmatrix} 4 & 0 \\ 0 & 0 \end{bmatrix} \\ &+ \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} + \begin{bmatrix} 1 & -2 \\ -2 & 4 \end{bmatrix} + \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 8 & 1 \\ 1 & 10 \end{bmatrix}. \end{aligned}$$

Hence,

$$\begin{split} \sum\_{ij} (X\_{ij} - \frac{X\_i}{n\_i})(X\_{ij} - \frac{X\_i}{n\_i})' &= \begin{bmatrix} 18 & -1 \\ -1 & 18 \end{bmatrix} + \begin{bmatrix} 8 & -6 \\ -6 & 6 \end{bmatrix} + \begin{bmatrix} 8 & 1 \\ 1 & 10 \end{bmatrix} \\ &= \begin{bmatrix} 34 & -6 \\ -6 & 34 \end{bmatrix} \text{ and } \begin{vmatrix} 34 & -6 \\ -6 & 34 \end{vmatrix} = 1120. \end{split}$$

Therefore, the observed *w* is given by

$$w = \frac{1120}{1491.73} = 0.7508, \ \sqrt{w} = 0.8665.$$

This is the case *p* = 2*, k* = 3, that is, our special Case (8). Then, the observed value of

$$t\_2 = \frac{1 - \sqrt{w}}{\sqrt{w}} = \frac{0.1335}{0.8665} = 0.1540,$$

and

$$\frac{n\_\cdot - 1 - p}{p} t\_2 = \frac{15 - 1 - 2}{2} (0.1540) = 0.9244\dots$$

Our F-statistic is *F*2*p,*2*(n.*−*p*−1*)* = *F*4*,*24. Let us test the hypothesis *A*<sup>1</sup> = *A*<sup>2</sup> = *A*<sup>3</sup> = *O* at the 5% significance level or *α* = 0*.*05. Since the observed value 0*.*9244 *<* 5*.*77 = *F*4*,*24*,*0*.*<sup>05</sup> which is available from F-tables, we do not reject the hypothesis.

#### **Verification of the calculations**

Denoting the total sum of squares and cross products matrix by *St* , the residual sum of squares and cross products matrix by *Sr* and the sum of squares and cross products matrix due to the hypothesis or due to the effects *Ai*'s by *Sh*, we should have *St* = *Sr* + *Sh* where

$$\mathbf{S}\_{\mathbf{t}} = \begin{bmatrix} 34 & -6 \\ -6 & 674/15 \end{bmatrix} \text{ and } \mathbf{S}\_{\mathbf{r}} = \begin{bmatrix} 34 & -6 \\ -6 & 34 \end{bmatrix}$$

as previously determined. Let us compute

$$S\_h = \sum\_{i=1}^k n\_i \left(\frac{X\_{i.}}{n\_i} - \frac{X\_{..}}{n\_.}\right) \left(\frac{X\_{i.}}{n\_i} - \frac{X\_{..}}{n\_.}\right)'.$$

For the first two components, we already have the following:

$$\begin{aligned} \frac{X\_{1.}}{n\_{1}} - \frac{X\_{..}}{n\_{.}} &= \begin{bmatrix} 1 \\ 0 \end{bmatrix} - \begin{bmatrix} 1 \\ 16/15 \end{bmatrix} = \begin{bmatrix} 0 \\ -16/15 \end{bmatrix}, \ n\_{1} = 5 \\\ \frac{X\_{2.}}{n\_{2}} - \frac{X\_{..}}{n\_{.}} &= \begin{bmatrix} 1 \\ 1 \end{bmatrix} - \begin{bmatrix} 1 \\ 16/15 \end{bmatrix} = \begin{bmatrix} 0 \\ -1/15 \end{bmatrix}, \ n\_{2} = 4 \\\ \frac{X\_{3.}}{n\_{3}} - \frac{X\_{..}}{n\_{.}} &= \begin{bmatrix} 1 \\ 2 \end{bmatrix} - \begin{bmatrix} 1 \\ 16/15 \end{bmatrix} = \begin{bmatrix} 0 \\ 14/15 \end{bmatrix}, \ n\_{3} = 6. \end{aligned}$$

Hence,

$$\begin{split} S\_h &= 5 \begin{bmatrix} 0 & 0 \\ 0 & 16^2/15^2 \end{bmatrix} + 4 \begin{bmatrix} 0 & 0 \\ 0 & 1/15^2 \end{bmatrix} + 6 \begin{bmatrix} 0 & 0 \\ 0 & 14^2/15^2 \end{bmatrix} \\ &= \begin{bmatrix} 0 & 0 \\ 0 & 2460/15^2 \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 0 & 164/15 \end{bmatrix}. \end{split}$$

As 34 <sup>+</sup> <sup>164</sup> <sup>15</sup> <sup>=</sup> <sup>674</sup> <sup>15</sup> , *St* = *Sr* + *Sh*, that is,

$$
\begin{bmatrix} 34 & -6 \\ -6 & 674/15 \end{bmatrix} = \begin{bmatrix} 34 & -6 \\ -6 & 34 \end{bmatrix} + \begin{bmatrix} 0 & 0 \\ 0 & 164/15 \end{bmatrix}.
$$

Thus, the result is verified.

**Problem (3): Data on all the three variables.** In this case, we have *p* = 3*, k* = 3. We will first use *X*1*., X*¯ <sup>1</sup>*, X*2*., X*¯ <sup>2</sup>*, X*3*., X*¯ <sup>3</sup>*, X..* and *X*¯ which have already been evaluated, to compute the residual sum of squares and cross product matrix. Since all the matrices are symmetric, for convenience, we will only display the diagonal elements and those above the diagonal. As in the case of two components, we compute the following, making use of the calculations already done for the 2-component case (the notations remaining the same since our general notation does not involve *p*):

$$\begin{aligned} \sum\_{j=1}^{5} (X\_{1j} - \frac{X\_1}{n\_1})(X\_{1j} - \frac{X\_1}{n\_1})' &= \begin{bmatrix} 1 & 3 & 1 \\ 9 & 3 & \\ & 1 & \end{bmatrix} + \begin{bmatrix} 9 & -6 & -3 \\ 4 & 2 & \\ & & 1 \end{bmatrix} + \begin{bmatrix} 4 & 4 & -2 \\ 4 & -2 & \\ & & 1 \end{bmatrix} \\ &+ \begin{bmatrix} 4 & -2 & 2 \\ 1 & -1 & \\ & & 1 \end{bmatrix} + \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ & 0 & 0 \\ & & 0 \end{bmatrix} = \begin{bmatrix} 18 & -1 & -2 \\ 18 & 2 & \\ & & 4 \end{bmatrix}; \\ \sum\_{j=1}^{4} (X\_{2j} - \frac{X\_2}{n\_2})(X\_{2j} - \frac{X\_2}{n\_2})' &= \begin{bmatrix} 0 & 0 & 0 \\ 1 & 2 & \\ & 4 \end{bmatrix} + \begin{bmatrix} 4 & -4 & -4 \\ 4 & 4 & \\ & 4 & \\ & & 4 \end{bmatrix} + \begin{bmatrix} 4 & -2 & -2 \\ 1 & 1 & \\ & & 1 \end{bmatrix} \\ &+ \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ & 1 \end{bmatrix} = \begin{bmatrix} 8 & -6 & -6 \\ 6 & 7 \\ 10 & \end{bmatrix}; \end{aligned}$$

$$\begin{aligned} \sum\_{j=1}^{6} (X\_{3j} - \frac{X\_3}{n\_3})(X\_{3j} - \frac{X\_3}{n\_3})' &= \begin{bmatrix} 1 & 0 & 2 \\ & 0 & 0 \\ & & 4 \end{bmatrix} + \begin{bmatrix} 0 & 0 & 0 \\ & 1 & 0 \\ & & 0 \end{bmatrix} + \begin{bmatrix} 4 & 0 & -2 \\ & 0 & 0 \\ & & 1 \end{bmatrix} \\ &+ \begin{bmatrix} 1 & 2 & 1 \\ & 4 & 2 \\ & & 1 \end{bmatrix} + \begin{bmatrix} 1 & -2 & -1 \\ & 4 & 2 \\ & & 1 \end{bmatrix} + \begin{bmatrix} 1 & 1 & -1 \\ & 1 & -1 \\ & & 1 \end{bmatrix} \\ &= \begin{bmatrix} 8 & 1 & -5 \\ & 10 & 3 \\ & & 8 \end{bmatrix} .\end{aligned}$$

Then,

$$\begin{aligned} \sum\_{ij} (X\_{ij} - \frac{X\_i}{n\_i})(X\_{ij} - \frac{X\_i}{n\_i})' &= \begin{bmatrix} 18 & -1 & -2 \\ & 18 & 2 \\ & & 4 \end{bmatrix} + \begin{bmatrix} 8 & -6 & -6 \\ & 6 & 7 \\ & & 10 \end{bmatrix} + \begin{bmatrix} 8 & 1 & -5 \\ & 10 & 3 \\ & & 8 \end{bmatrix} \\ &= \begin{bmatrix} 34 & -6 & -13 \\ & 34 & 12 \\ & & 22 \end{bmatrix} \end{aligned}$$

whose determinant is equal to

$$\begin{aligned} &= 34 \begin{vmatrix} 34 & 12 \\ 12 & 22 \end{vmatrix} + 6 \begin{vmatrix} -6 & 12 \\ -13 & 22 \end{vmatrix} - 13 \begin{vmatrix} -6 & 34 \\ -13 & 12 \end{vmatrix} \\ &= 15870. \end{aligned}$$

The total sum of squares and cross products matrix is the following:

$$\begin{aligned} \sum\_{ij} (\mathbf{X}\_{ij} - \bar{\mathbf{X}})(\mathbf{X}\_{ij} - \bar{\mathbf{X}})' &= \sum\_{j=1}^{5} (\mathbf{X}\_{1j} - \bar{\mathbf{X}})(\mathbf{X}\_{1j} - \bar{\mathbf{X}})' + \sum\_{j=1}^{4} (\mathbf{X}\_{2j} - \bar{\mathbf{X}})(\mathbf{X}\_{2j} - \bar{\mathbf{X}})' \\ &+ \sum\_{j=1}^{6} (\mathbf{X}\_{3j} - \bar{\mathbf{X}})(\mathbf{X}\_{3j} - \bar{\mathbf{X}})', \end{aligned}$$

with

$$\begin{aligned} \sum\_{j=1}^{5} (X\_{1j} - \bar{X})(X\_{1j} - \bar{X})' &= \begin{bmatrix} 1 & 29/15 & 9/15 \\ 29^2/15^2 & (29 \times 9)/15^2 \\ 9^2/15^2 \end{bmatrix} \\ &+ \begin{bmatrix} 9 & -(3 \times 46)/15 & -(3 \times 21)/15 \\ 46^2/15^2 & (46 \times 21)/15^2 \\ 21^2/15^2 \end{bmatrix} + \begin{bmatrix} 4 & 92/15 & -18/15 \\ 46^2/15^2 & -(46 \times 9)/15^2 \\ 9^2/15^2 & & \\ 1/15^2 & 21/15^2 \end{bmatrix} \\ &+ \begin{bmatrix} 4 & 2/15 & 42/15 \\ 1/15^2 & 21/15^2 \\ 21^2/15^2 \end{bmatrix} + \begin{bmatrix} 0 & 0 & 0 \\ 16^2/15^2 & 96/15^2 \\ 36/15^2 \end{bmatrix} \\ &= \begin{bmatrix} 18 & -1 & -2 \\ 5330/15^2 & 930/15^2 \\ 1080/15^2 \end{bmatrix}, \end{aligned}$$

$$\begin{aligned} \sum\_{j=1}^{4} (X\_{2j} - \bar{X})(X\_{2j} - \bar{X})' &= \begin{bmatrix} 0 & 0 & 0 \\ 14^2/15^2 & (14 \times 24)/15^2 \\ & 24^2/15^2 \end{bmatrix} \\ &+ \begin{bmatrix} 4 & -62/15 & -72/15 \\ 31^2/15^2 & (31 \times 36)/15^2 \\ & 36^2/15^2 \end{bmatrix} + \begin{bmatrix} 4 & -28/15 & -18/15 \\ & 14^2/15^2 & (14 \times 9)/15^2 \\ & & 9^2/15^2 \end{bmatrix} \\ &+ \begin{bmatrix} 0 & 0 & 0 \\ & 1/15^2 & 21/15^2 \\ & & 21^2/15^2 \end{bmatrix} = \begin{bmatrix} 8 & -6 & -6 \\ & 1354/15^2 & 1599/15^2 \\ & & 2394/15^2 \end{bmatrix}, \end{aligned}$$

$$\begin{aligned} \sum\_{j=1}^{6} (X\_{3j} - \bar{X})(X\_{3j} - \bar{X})' &= \begin{bmatrix} 1 & 14/15 & -21/15 \\ 14^2/15^2 & -(14 \times 21)/15^2 \\ 21^1/15^2 \\ 29^2/15^2 & (29 \times 9)/15^2 \end{bmatrix} + \begin{bmatrix} 4 & -28/15 & -48/15 \\ 14^2/15^2 & (14 \times 24)/15^2 \\ 9^2/15^2 & (14 \times 24)/15^2 \end{bmatrix} \\ &+ \begin{bmatrix} 1 & 44/15 & 24/15 \\ 44^2/15^2 & (44 \times 24)/15^2 \\ 24^2/15^2 & & \\ \end{bmatrix} + \begin{bmatrix} 1 & -16/15 & -6/15 \\ 16^2/15^2 & 96/15^2 \\ 6^2/15^2 & & \\ \end{bmatrix} \\ &+ \begin{bmatrix} 1 & 1/15 & -24/15 \\ 1/15^2 & -24/15^2 \\ 24^2/15^2 & & \\ \end{bmatrix} = \begin{bmatrix} 8 & 1 & -5 \\ 3426/15^2 & 1431/15^2 \\ 2286/15^2 & & \\ \end{bmatrix}. \end{aligned}$$

Hence the total sum of squares and cross products matrix is

$$\begin{aligned} \sum\_{ij} (X\_{ij} - \bar{X})(X\_{ij} - \bar{X})' &= \begin{bmatrix} 18 & -1 & -2 \\ & 5330/15^2 & 930/15^2 \\ & & 1080/15^2 \end{bmatrix} + \begin{bmatrix} 8 & -6 & -6 \\ & 1354/15^2 & 1599/15^2 \\ & & 2394/15^2 \end{bmatrix} \\ &+ \begin{bmatrix} 8 & 1 & -5 \\ & 3426/15^2 & 1431/15^2 \\ & & 2286/15^2 \end{bmatrix} = \begin{bmatrix} 34 & -6 & -13 \\ -6 & 674/15 & 264/15 \\ -13 & 264/15 & 384/15 \end{bmatrix} \\ \text{and its determinant} &= 34 \begin{vmatrix} 674/15 & 264/15 \\ 264/15 & 384/15 \end{vmatrix} + 6 \begin{bmatrix} -6 & 264/15 \\ -13 & 384/15 \end{bmatrix} \\ &- 13 \begin{vmatrix} -6 & 674/15 \\ -13 & 264/15 \end{vmatrix} = 342126/15. \end{aligned}$$

Then, the observed value of

$$w = \frac{15870 \times 15}{342126} = 0.6958, \ \sqrt{w} = 0.8341.1$$

Since *p* = 3 and *k* = 3, an exact distribution is available from our special Case (8) for *<sup>t</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup>−√*<sup>w</sup>* <sup>√</sup>*<sup>w</sup>* and an observed value of *<sup>t</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*1989*.* Then,

$$\frac{n\_\cdot - 1 - p}{p} t\_2 = \frac{15 - 1 - 3}{3} t\_2 = \frac{11}{3} t\_2 \sim F\_{2p, 2(n\_\cdot - 1 - p)} = F\_{6, 22}.$$

The critical value obtained from an F-table at the 5% significance level is *F*6*,*22*,.*<sup>05</sup> ≈ 3*.*85. Since the observed value of *F*6*,*<sup>22</sup> is <sup>11</sup> <sup>3</sup> *(*0*.*1989*)* = 0*.*7293 *<* 3*.*85, the hypothesis *A*<sup>1</sup> = *A*<sup>2</sup> = *A*<sup>3</sup> = *O* is not rejected. It can also be verified that *St* = *Sr* + *Sh*.

#### **13.3.4. Asymptotic distribution of the** *λ***-criterion**

We can obtain an asymptotic real chisquare distribution for *n.* → ∞. To this end, consider the general *<sup>h</sup>*-th moments of *<sup>λ</sup>* or *<sup>E</sup>*[*λh*] from (13.3.8), that is,

$$\begin{aligned} E[\lambda^h] &= C\_{p,k} \prod\_{j=1}^p \left[ \Gamma\left(\frac{n\_.-k}{2} - \frac{j-1}{2} + \frac{n\_.}{2}h\right) \Big/ \Gamma\left(\frac{n\_.-1}{2} - \frac{j-1}{2} + \frac{n\_.}{2}h\right) \right] \\ &= C\_{p,k} \prod\_{j=1}^p \left[ \Gamma\left(\frac{n\_.}{2}(1+h) - \frac{j-1}{2} - \frac{k}{2}\right) / \Gamma\left(\frac{n\_.}{2}(1+h) - \frac{j-1}{2} - \frac{1}{2}\right) \right]. \end{aligned}$$

Let us expand all the gamma functions in *<sup>E</sup>*[*λh*] by using the first term in the asymptotic expansion of a gamma function or by making use of Stirling's approximation formula, namely,

$$
\Gamma(z+\delta) \approx \sqrt{(2\pi)} z^{z+\delta-\frac{1}{2}} e^{-z} \tag{13.3.13}
$$

for <sup>|</sup>*z*|→∞ when *<sup>δ</sup>* is a bounded quantity. Taking *n.* <sup>2</sup> → ∞ in the constant part and *n.* <sup>2</sup> *(*1 + *h)* → ∞ in the part containing *h*, we have

$$\begin{split} \Gamma\left(\frac{n}{2}(1+h) - \frac{j-1}{2} - \frac{k}{2}\right) \Big/ \Gamma\left(\frac{n}{2}(1+h) - \frac{j-1}{2} - \frac{1}{2}\right) \\ \qquad \approx \left\{\sqrt{(2\pi)}\Gamma\_{\frac{n}{2}}^{\frac{n}{2}(1+h)}(1+h)^{-\frac{j-1}{2} - \frac{k}{2} - \frac{1}{2}} \mathbf{e}^{-\frac{n}{2}(1+h)} \\ \qquad \qquad \Big/ \sqrt{(2\pi)}\Gamma\_{\frac{n}{2}}^{\frac{n}{2}(1+h)}(1+h)^{\frac{j-1}{2} - \frac{1}{2} - \frac{1}{2}} \mathbf{e}^{\frac{n}{2}(1+h)} \Big\} \\ \qquad = (\frac{n}{2})^{-(\frac{k-1}{2})}(1+h)^{-(\frac{k-1}{2})}. \end{split}$$

The factor *(n.* 2 *)* <sup>−</sup>*( <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *)* is canceled from the expression coming from the constant part. Then, taking the product over *j* = 1*,... , p,* we have

$$
\lambda^h \to (1+h)^{-p(k-1)/2} \text{ or } \lambda^{-2h} \to (1-2h)^{-p(k-1)/2} \text{ for } 1-2h > 0,
$$

which is the moment generating function (mgf) of a real scalar chisquare with *p(k* − 1*)* degrees of freedom. Hence, we have the following result:

**Theorem 13.3.1.** *Letting λ be the likelihood ratio criterion for testing the hypothesis Ho* : *A*<sup>1</sup> = *A*<sup>2</sup> =···= *Ak* = *O, the asymptotic distribution of* −2 ln *λ is a real chisquare random variable having p(k* − 1*) degrees of freedom as n.* → ∞*, that is,*

$$-2\ln\lambda \to \chi^2\_{p(k-1)}\ as\ n\_\cdot \to \infty. \tag{13.3.14}$$

Observe that we only require the sum of the sample sizes *n*<sup>1</sup> +···+ *nk* = *n.* to go to infinity, and not that the individual *nj* 's be large. This chisquare approximation can be utilized for testing the hypothesis for large values of *n.,* and we then reject *Ho* for small values of *<sup>λ</sup>*, which means for large values of <sup>−</sup>2 ln *<sup>λ</sup>* or large values of *<sup>χ</sup>*<sup>2</sup> *p(k*−1*)* , that is, when the observed <sup>−</sup>2 ln *<sup>λ</sup>* <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *p(k*−1*),α* where *<sup>χ</sup>*<sup>2</sup> *p(k*−1*),α* denotes the upper 100 *<sup>α</sup>*% percentage point of the chisquare distribution.

#### **13.3.5. MANOVA and testing the equality of population mean values**

In a one-way classification model, we have the following for the *p*-variate case:

$$X\_{ij} = M + A\_i + E\_{ij} \text{ or } X\_{ij} = M\_i + E\_{ij}, \text{ with } M\_i = M + A\_i,\tag{13.3.15}$$

for *j* = 1*,...,ni, i* = 1*,...,k*. When the error vector is assumed to have a null expected value, that is, *E*[*Eij* ] = *O*, for all *i* and *j* , we have *E*[*Xij* ] = *Mi* for all *i* and *j* . Thus, this assumption, in conjunction with the hypothesis *A*<sup>1</sup> = *A*<sup>2</sup> = ··· = *Ak* = *O*, implies that *M*<sup>1</sup> = *M*<sup>2</sup> = ··· = *Mk*, that is, the hypothesis of equality of the population mean value vectors or the test is equivalent to testing the equality of population mean value vectors in *k* independent populations with common covariance matrix *Σ>O*. We have already tackled this problem in Chap. 6 under both assumptions that *Σ* is known and unknown, when the populations are Gaussian, that is, *Xij* ∼ *Np(Mi, Σ), Σ > O*. Thus, the hypothesis made in a one-way classification MANOVA setting and the hypothesis of testing the equality of mean value vectors in MANOVA are one and the same. In the scalar case too, the ANOVA in a one-way classification data coincides with testing the equality of population mean values in *k* independent univariate populations. In the ANOVA case, we are comparing the sum of squares attributable to the hypothesis to the residual sum of squares. If the hypothesis really holds true, then the sum of squares due to the hypothesis or to the *αj* 's (deviations from the general effect due to the *j* -th treatment) must be zero and hence for large values of the sum of squares due to the presence of the *αj* 's, as compared to the residual sum of squares, we reject the hypothesis. In MANOVA, we are comparing two sums of squares and cross product matrices, namely,

$$U = \sum\_{ij} \left[ \frac{X\_{i.}}{n\_i} - \frac{X\_{..}}{n\_{.}} \right] \left[ \frac{X\_{i.}}{n\_i} - \frac{X\_{..}}{n\_{.}} \right]' \text{ and } V = \sum\_{ij} \left[ X\_{ij} - \frac{X\_{i.}}{n\_i} \right] \left[ X\_{ij} - \frac{X\_{i.}}{n\_i} \right]'.$$

We have the following distributional properties:

$$T\_1 = (U+V)^{-\frac{1}{2}}U(U+V)^{-\frac{1}{2}} \sim \text{real matrix-variable type-1 beta with parameters } \cdot,$$

$$(\frac{k-1}{2}, \frac{n-k}{2});$$

$$T\_1 = (U+V)^{-\frac{1}{2}}V(U+V)^{-\frac{1}{2}}, \quad \text{and matrix consists from 1 beta with parameters } (U,V)^{-\frac{1}{2}}, \quad \text{and } (U,V)^{-\frac{1}{2}}$$

*<sup>T</sup>*<sup>2</sup> <sup>=</sup> *(U* <sup>+</sup> *V )*−<sup>1</sup> <sup>2</sup>*V (U* <sup>+</sup> *V )*−<sup>1</sup> <sup>2</sup> ∼ real matrix-variate type-1 beta with parameters *(n.*−*<sup>k</sup>* <sup>2</sup> *, <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *)*;

*<sup>T</sup>*<sup>3</sup> <sup>=</sup> *<sup>V</sup>* <sup>−</sup><sup>1</sup> <sup>2</sup>*UV* <sup>−</sup><sup>1</sup> <sup>2</sup> <sup>∼</sup> real matrix-variate type-2 beta with parameters *( <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *, n.*−*<sup>k</sup>* <sup>2</sup> *)*; *<sup>T</sup>*<sup>4</sup> <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> 2*V U*−<sup>1</sup> <sup>2</sup> <sup>∼</sup> real matrix-variate type-2 beta with parameters *(n.*−*<sup>k</sup>* <sup>2</sup> *, <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *).* (13.3.16)

The likelihood ratio criterion is

$$\lambda = \frac{|V|}{|U+V|} = |T\_2| = \frac{1}{|T\_3 + I|} = \frac{1}{\prod\_{j=1}^p (1 + \eta\_j)}\tag{13.3.17}$$

where the *ηj* 's are the eigenvalues of *T*3. We reject *Ho* for small values of *λ* which means for large values of *p <sup>j</sup>*=1[<sup>1</sup> <sup>+</sup> *ηj* ]. The basic objective in MANOVA consists of comparing *U* and *V* , the matrices due to the presence of treatment effects and due to the residuals, respectively. We can carry out this comparison by using the type-1 beta matrices *T*<sup>1</sup> and *T*<sup>2</sup> or the type-2 beta matrices *T*<sup>3</sup> and *T*<sup>4</sup> or by making use of the eigenvalues of these matrices. In the type-1 beta case, the eigenvalues will be between 0 and 1, whereas in the type-2 beta case, the eigenvalues will be real positive or simply positive. We may also note that the eigenvalues of *<sup>T</sup>*<sup>1</sup> and its nonsymmetric forms *U (U* <sup>+</sup> *V )*−<sup>1</sup> or *(U* <sup>+</sup> *V )*−1*<sup>U</sup>* are identical. Similarly, the eigenvalues of the symmetric form *<sup>T</sup>*<sup>2</sup> and *V (U* <sup>+</sup>*V )*−<sup>1</sup> or *(U* <sup>+</sup> *V )*−1*<sup>V</sup>* are one and the same. As well, the eigenvalues of the symmetric form *T*<sup>3</sup> and the nonsymmetric forms *UV* <sup>−</sup><sup>1</sup> or *V* <sup>−</sup>1*U* are the same. Again, the eigenvalues of the symmetric form *T*<sup>4</sup> and its nonsymmetric forms *U*−1*V* or *V U*−<sup>1</sup> are the same. Several researchers have constructed tests based on the matrices *T*1*, T*2*, T*3*, T*<sup>4</sup> or their nonsymmetric forms or their eigenvalues. Some of the well-known test statistics are the following:

$$\begin{aligned} \text{Lawley-Hotelling trace } &= \text{tr}(T\_{\mathbb{D}})\\ \text{Roy's largest root } &= \text{the largest eigenvalue of } T\_2\\ \text{Pillai's trace } &= \text{tr}(T\_1) \\ \text{Wilks' lambda } &= |T\_2| = \text{ the likelihood ratio statistic.} \end{aligned}$$

For example, when the hypothesis is true, we expect the eigenvalues of *T*<sup>3</sup> to be small and hence we may reject the hypothesis when its smallest eigenvalue is large or the trace of *T*<sup>3</sup> is large. If we are using *T*4, then when the hypothesis is true, we expect *T*<sup>4</sup> to be large in the sense that the eigenvalues will be large, and therefore we may reject the hypothesis for small values of its largest eigenvalue or its trace. If we are utilizing *T*1, we are actually comparing the contribution attributable to the treatments to the total variation. We expect this to be small under the hypothesis and hence, we may reject the hypothesis for large values of its smallest eigenvalue or its trace. If we are using *T*2, we are comparing the residual part to the total variation. If the hypothesis is true, then we can expect a substantial contribution from the residual part so that we may reject the hypothesis for small values of the largest eigenvalue or the trace in this case. These are the main ideas in connection with constructing statistics for testing the hypothesis on the basis of the eigenvalues of the matrices *T*1*, T*2*, T*<sup>3</sup> and *T*4.

### **13.3.6. When** *Ho* **is rejected**

When *Ho* : *A*<sup>1</sup> =···= *Ak* = *O* is rejected, it is plausible that some of the differences may be non-null, that is, *Ai*−*Aj* = *O* for some *i* and *j* , *i* = *j* . We may then test individual hypotheses of the type *Ho*<sup>1</sup> : *Ai* = *Aj* for *i* = *j* . There are *k(k* − 1*)/*2 such differences. This type of test is equivalent to testing the equality of the mean value vectors in two independent *p*-variate Gaussian populations with the same covariance matrix *Σ>O*. This has already been discussed in Chap. 6 for the cases *Σ* known and *Σ* unknown. In this instance, we can use the special Case (7) where for *k* = 2, and the statistic *t*<sup>1</sup> is real scalar type-2 beta distributed with the parameters *(<sup>p</sup>* <sup>2</sup> *, n.*−1−*<sup>p</sup>* <sup>2</sup> *)*, so that

$$\frac{n\_\cdot - 1 - p}{p} t\_\mathbb{I} \sim F\_{p, n\_\cdot - 1 - p} \tag{13.3.18}$$

where *n.* = *ni* + *nj* for some specific *i* and *j* . We can make use of (13.3.18) for testing individual hypotheses. By utilizing Special Case (8) for *k* = 3, we can also test a hypothesis of the type *Ai* = *Aj* = *Am* for different *i, j, m*. Instead of comparing the results of all the *k(k* − 1*)/*2 individual hypotheses, we may examine the estimates of *Ai*, namely, *<sup>A</sup>*ˆ*<sup>i</sup>* <sup>=</sup> *Xi. ni* <sup>−</sup> *X.. n. , i* <sup>=</sup> <sup>1</sup>*,...,k*. Consider the norms *Xi. ni* <sup>−</sup> *Xj. nj , i* = *j* (the Euclidean norm may be taken for convenience). Start with the individual test corresponding to the maximum value of these norms. If this test is not rejected, it is likely that tests on all other differences will not be rejected either. If it is rejected, we then take the next largest difference and continue testing.

**Note 13.3.1.** Usually, before initiating a MANOVA, the assumption that the covariance matrices associated with the *k* populations or treatments are equal is tested. It may happen that the error variable *E*1*<sup>j</sup> , j* = 1*,...,n*1*,* may have the common covariance matrix *Σ*1, *E*2*<sup>j</sup> , j* = 1*,...,n*2*,* may have the common covariance matrix *Σ*2, and so on, where not all the *Σj* 's equal. In this instance, we may first test the hypothesis *Ho* : *Σ*<sup>1</sup> = *Σ*<sup>2</sup> = ··· = *Σk*. This test is already described in Chap. 6. If this hypothesis is not rejected, we may carry out the MANOVA analysis of the data. If this hypothesis is rejected, then some of the *Σj* 's may not be equal. In this case, we test individual hypotheses of the type *Σi* = *Σj* for some specific *i* and *j ,i* = *j* . Include all treatments for which the individual hypotheses are not rejected by the tests and exclude the data on the treatments whose *Σj* 's may be different, but distinct from those already selected. Continue with the MANOVA analysis of the data on the treatments which are retained, that is, those for which the *Σj* 's are equal in the sense that the corresponding tests of equality of covariance matrices did not reject the hypotheses.

**Example 13.3.2.** For the sake of illustration, test the hypothesis *Ho* : *A*<sup>1</sup> = *A*<sup>2</sup> with the data provided in Example 13.3.1.

**Solution 13.3.2.** We can utilize some of the computations done in the solution to Example 13.3.1. Here, *n*<sup>1</sup> = 5*, n*<sup>2</sup> = 4 and *n.* = *n*<sup>1</sup> + *n*<sup>2</sup> = 9. We disregard the third sample. The residual sum of squares and cross products matrix in the present case is available from the Solution 13.3.1 by omitting the matrix corresponding to the third sample. Then,

$$\begin{aligned} \sum\_{i=1}^{2} \sum\_{j=1}^{n\_i} \left( X\_{ij} - \frac{X\_{i.}}{n\_i} \right) \left( X\_{ij} - \frac{X\_{i.}}{n\_i} \right)' &= \begin{bmatrix} 18 & -1 & -2 \\ & 18 & 2 \\ & & 4 \end{bmatrix} + \begin{bmatrix} 8 & -6 & -6 \\ & 6 & 7 \\ & & 10 \end{bmatrix} \\ &= \begin{bmatrix} 26 & -7 & -8 \\ -7 & 24 & 9 \\ -8 & 9 & 14 \end{bmatrix} \end{aligned}$$

whose determinant is

$$26\begin{vmatrix}24 & 9\\9 & 14\end{vmatrix} + 7\begin{vmatrix}-7 & 9\\-8 & 14\end{vmatrix} - 8\begin{vmatrix}-7 & 24\\-8 & 9\end{vmatrix} = 5416.7$$

Let us compute <sup>2</sup> *i*=1 *ni <sup>j</sup>*=<sup>1</sup>*(Xij* <sup>−</sup> *X.. n. )(Xij* <sup>−</sup> *X.. n. )*- :

$$\begin{aligned} \sum\_{j=1}^{5} \left( X\_{1j} - \frac{X\_{..}}{n\_{.}} \right) \left( X\_{1j} - \frac{X\_{..}}{n\_{.}} \right)' &= \begin{bmatrix} 1 & 23/9 & 5/9 \\ 23^{2}/9^{2} & 115/9^{2} \\ 5^{2}/9^{2} \end{bmatrix} + \begin{bmatrix} 9 & -66/9 & -39/9 \\ 22^{2}/9^{2} & 286/9^{2} \\ 13^{2}/9^{2} \end{bmatrix} \\ &+ \begin{bmatrix} 4 & 22/9 & -10/9 \\ 22^{2}/9^{2} & -110/9^{2} \\ 5^{2}/9^{2} \end{bmatrix} + \begin{bmatrix} 4 & -10/9 & 26/9 \\ 5^{2}/9^{2} & -65/9^{2} \\ 13^{2}/9^{2} \end{bmatrix} \\ &+ \begin{bmatrix} 0 & 0 & 0 \\ 4^{2}/9^{2} & 16/9^{2} \\ 4^{2}/9^{2} & \end{bmatrix} = \begin{bmatrix} 18 & -31/9 & -18/9 \\ 1538/9^{2} & 242/9^{2} \\ 404/9^{2} \end{bmatrix}; \end{aligned}$$

$$\begin{split} \sum\_{j=1}^{4} \left( X\_{2j} - \frac{X\_{\cdot \cdot}}{n\_{\cdot \cdot}} \right) \left( X\_{2j} - \frac{X\_{\cdot \cdot}}{n\_{\cdot \cdot}} \right)' &= \begin{bmatrix} 0 & 0 & 0 \\ 14^{2}/9^{2} & 14^{2}/9^{2} \\ & 14^{2}/9^{2} \end{bmatrix} + \begin{bmatrix} 4 & -26/9 & -44/9 \\ 13^{2}/9^{2} & (13 \times 22)/9^{2} \\ & 22^{2}/9^{2} \end{bmatrix} \\ &+ \begin{bmatrix} 4 & -28/9 & -10/9 \\ & 14^{2}/9^{2} & 70/9^{2} \\ & & 5^{2}/9^{2} \end{bmatrix} + \begin{bmatrix} 0 & 0 & 0 \\ & 5^{2}/9^{2} & -65/9^{2} \\ & & 13^{2}/9^{2} \end{bmatrix} \\ &= \begin{bmatrix} 8 & -6 & -6 \\ & 586/9^{2} & 487/9^{2} \\ & & 874/9^{2} \end{bmatrix}. \end{split}$$

Hence the sum

$$\begin{aligned} \sum\_{i=1}^{2} \sum\_{j=1}^{n\_i} \left( X\_{ij} - \frac{X\_{..}}{n\_{.}} \right) \left( X\_{ij} - \frac{X\_{..}}{n\_{.}} \right)' &= \begin{bmatrix} 18 & -31/9 & -18/9 \\ & 1538/9^2 & 242/9^2 \\ & & 404/9^2 \end{bmatrix} + \begin{bmatrix} 8 & -6 & -6 \\ & 586/9^2 & 487/9^2 \\ & & 874/9^2 \end{bmatrix} \\ &= \begin{bmatrix} 26 & -85/9 & -8 \\ -85/9 & 236/9 & 9 \\ -8 & 9 & 142/9 \end{bmatrix} = U + V \end{aligned}$$

whose determinant is

$$26\begin{vmatrix}236/9 & 9\\9 & 142/9 \end{vmatrix} + \frac{85}{9} \begin{vmatrix} -85/9 & 9\\-8 & 142/9 \end{vmatrix} - 8\begin{vmatrix} -85/9 & 236/9\\-8 & 9 \end{vmatrix} = 8380.0549.7$$

So, the observed values are as follows:

$$\begin{aligned} w &= \frac{|V|}{|U+V|} = \frac{5416}{8380.055} = 0.6463, \\ t\_1 &= \frac{1-w}{w} = \frac{0.3537}{0.6463} = 0.5413 \\ \frac{n\_. - 1 - p}{p} t\_1 &= \frac{5}{3} t\_1 = \frac{5}{3} (0.5413) = 0.9022, \end{aligned}$$

and *Fp,n.*−1−*<sup>p</sup>* = *F*3*,*5. Let us test the hypothesis at the 5% significance level. The critical value obtained from F-tables is *F*3*,*5*,*0*.*<sup>05</sup> = 9*.*01. But since the observed value of F is 0*.*9022 *<* 9*.*01*,* the hypothesis is not rejected. We expected this result because the hypothesis *A*<sup>1</sup> = *A*<sup>2</sup> = *A*<sup>3</sup> was not rejected. This example was mainly presented to illustrate the steps.

#### **13.4. MANOVA for Two-Way Classification Data**

As was done previously for the one-way classification, we will revisit the real scalar variable case first. Thus, we consider the case of two sets of treatments, instead of the single set analyzed in Sect. 13.3. In an agricultural experiment, suppose that we are considering *r* fertilizers as the first set of treatments, say *F*1*,...,Fr,* along with a set of *s* different varieties of corn, *V*1*,...,Vs,* as the second set of treatments. A randomized block experiment belongs to this category. In this case, *r* blocks of land, which are homogeneous with respect to all factors that may affect the yield of corn, such as precipitation, fertility of the soil, exposure to sunlight, drainage, and so on, are selected. Fertilizers *F*1*,...,Fr* are applied to these *r* blocks at random, the first block receiving any one of *F*1*,...,Fr,* and so on. Each block is divided into *s* equivalent plots, all the plots being of the same size, shape, and so on. Then, the *s* varieties of corn are applied to each block at random, with one variety to each plot. Such an experiment is called a randomized block experiment. This experiment is then replicated *t* times. This replication is done so that possible interaction between fertilizers and varieties of corn could be tested. If the randomized block experiment is carried out only once, no interaction can be tested from such data because each plot will have only one observation. Interaction between the *i*-th fertilizer and *j* -th variety is a joint effect for the *(Fi, Vj )* combination, that is, the effect of *Fi* on the yield varies with the variety of corn. For instance, an interaction will be present if the effect of *F*<sup>1</sup> is different when combined with *V*<sup>1</sup> or *V*2. In other words, there are individual effects and joint effects, a joint effect being referred to as an interaction between the two sets of treatments. As an example, consider one set of treatments consisting of *r* different methods of teaching and a second set of treatments that could be *s* levels of previous exposure of the students to the subject matter.

#### **13.4.1. The model in a two-way classification**

The additive, fixed effect, two-way classification or two-way layout model with interaction is the following:

$$\mathbf{x}\_{ijk} = \mu + a\_i + \beta\_j + \gamma\_{ij} + e\_{ijk}, \; i = 1, \ldots, r, \; j = 1, \ldots, s, \; k = 1, \ldots, t,\tag{13.4.1}$$

where *μ* is a general effect, *αi* is the deviation from the general effect due to the *i*-th treatment of the first set, *βj* is the deviation from the general effect due to the *j* -th treatment of the second set, and *γij* is the effect due to interaction term or the joint effect of first and second sets of treatments. In a randomized block experiment, the treatments belonging to the first set are called "blocks" or "rows" and the treatments belonging to the second set are called "treatments" or "columns"; thus, the two sets correspond to rows, say *R*1*,...,Rr,* and columns, say *C*1*,...,Cs*. Then, *γij* is the deviation from the general effect due to the combination *(Ri, Cj )*. The random component *eijk* is the sum total contributions coming from all unknown factors and *xijk* is the observation resulting from the effect of the combination of treatments *(Ri, Cj )* at the *k*-th replication or *k*-th identical repetition of the experiment. In an agricultural setting, the observation may be the yield of corn whereas, in a teaching experiment, the observation may be the grade obtained by the "*(i, j, k)*"-th student. In a fixed effect model, all parameters *μ, α*1*,...,αr, β*1*,...,βs* are assumed to be unknown constants. In a random effect model *α*1*,...,αr* or *β*1*,...,βs* or both sets are assumed to be random variables. We assume that *<sup>E</sup>*[*eijk*] = 0 and Var*(eijk)* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>&</sup>gt;* <sup>0</sup> for all *i, j, k,* where *E(*·*)* denotes the expected value of *(*·*)*. In the present discussion, we will only consider the fixed effect model. Under this model, the data are called two-way classification data or two-way layout data because they can be classified according to the two sets of treatments, "rows" and "columns". Since we are not making any assumption about the distribution of *eijk*, and thereby that of *xijk,* we will apply the method of least squares to estimate the parameters.

#### **13.4.2. Estimation of parameters in a two-way classification**

The error sum of squares is

$$e\_{ijk}^2 = \sum\_{ijk} (\alpha\_{ijk} - \mu - \alpha\_i - \beta\_j - \gamma\_{ij})^2.$$

Our first objective consists of isolating the sum of squares due to interaction and test the hypothesis of no interaction, that is, *Ho* : *γij* = 0 for all *i, j* and *k*. If *γij* = 0, part of the effect of the *i*-th row *Ri* is mixed up with the interaction and similarly, part of the effect of the *j* -th column, *Cj* , is intermingled with *γij* , so that no hypothesis can be tested on the *αi*'s and *βj* 's unless *γij* is zero or negligibly small or the hypothesis *γij* = 0 is not rejected. As well, on noting that in [*μ* + *αi* + *βj* + *γij* ], the subscripts either appear none at a time, one at a time and both at a time, we may write *μ* + *αi* + *βj* + *γij* = *mij* . Thus,

$$\begin{aligned} \sum\_{ijk} e\_{ijk}^2 &= \sum\_{ijk} (\mathbf{x}\_{ijk} - m\_{ij})^2 \Rightarrow \frac{\partial}{\partial m\_{ij}} [e\_{ijk}^2] = 0\\ \Rightarrow \sum\_k (\mathbf{x}\_{ijk} - m\_{ij}) &= 0 \Rightarrow \mathbf{x}\_{ij.} - t \,\hat{m}\_{ij} = 0 \text{ or } \hat{m}\_{ij} = \frac{\mathbf{x}\_{ij.}}{t}. \end{aligned}$$

We employ the standard notation in this area, namely that a summation over a subscript is denoted by a dot. Then, the least squares minimum under the general model or the residual sum of squares, denoted by *s*2, is given by

$$s^2 = \sum\_{ijk} \left( x\_{ijk} - \frac{x\_{ij.}}{t} \right)^2. \tag{13.4.2}$$

Now, consider the hypothesis *Ho* : *γij* = 0 for all *i* and *j* . Under this *Ho,* the model becomes

$$\mathbf{x}\_{ijk} = \mu + \alpha\_i + \beta\_j + e\_{ijk} \text{ or } \sum\_{ijk} e\_{ijk}^2 = \sum\_{ijk} (\mathbf{x}\_{ijk} - \mu - \alpha\_i - \beta\_j)^2.$$

We differentiate this partially with respect to *μ* and *αi* for a specific *i*, and to *βj* for a specific *j* , and then equate the results to zero and solve to obtain estimates for *μ, αi* and *βj* . Since we have taken *αi, βj* and *γij* as deviations from the general effect *μ*, we may let *α.* = *α*<sup>1</sup> +···+ *αr* = 0*, β.* = *β*<sup>1</sup> +···+ *βs* = 0 and *γi.* = 0, for each *i* and *γ.j* = 0 for each *j* , without any loss of generality. Then,

$$\frac{\partial}{\partial \mu} [e\_{ijk}^2] = 0 \Rightarrow \left(\sum\_{ijk} x\_{ijk}\right) - rst\mu - st\alpha\_{\cdot} - rt\beta\_{\cdot} = 0 \Rightarrow \hat{\mu} = \frac{x\_{\dots}}{rst}$$

$$\frac{\partial}{\partial \alpha\_i} [e\_{ijk}^2] = 0 \Rightarrow \sum\_{jk} [x\_{ijk} - \mu - \alpha\_i - \beta\_j] = 0$$

$$\Rightarrow \text{x.i.} - st\mu - st\alpha\_i - t\beta\_{\cdot} = 0 \Rightarrow \hat{\alpha}\_i = \frac{x\_{i.}}{st} - \hat{\mu}$$

$$\frac{\partial}{\partial \beta\_j} [e\_{ijk}^2] = 0 \Rightarrow \sum\_{ik} [x\_{ijk} - \mu - \alpha\_i - \beta\_j] = 0$$

$$\Rightarrow \text{x.j.} - rt\mu - t\alpha\_{\cdot} - rt\beta\_{\cdot} = 0 \Rightarrow \hat{\beta}\_j = \frac{\mathbf{x}\_{\cdot j.}}{rt} - \hat{\mu} \cdot \mathbf{J}$$

Hence, the least squares minimum under the hypothesis *Ho*, denoted by *s*<sup>2</sup> <sup>0</sup> , is

$$\begin{split} s\_0^2 &= \sum\_{ijk} \left[ \left( \mathbf{x}\_{ijk} - \frac{\mathbf{x}\_{\dots}}{rst} \right) - \left( \frac{\mathbf{x}\_{\bar{i}\dots}}{st} - \frac{\mathbf{x}\_{\dots}}{rst} \right) - \left( \frac{\mathbf{x}\_{\cdot j}}{rt} - \frac{\mathbf{x}\_{\dots}}{rst} \right) \right]^2 \\ &= \sum\_{ijk} \left( \mathbf{x}\_{ijk} - \frac{\mathbf{x}\_{\dots}}{rst} \right)^2 - st \sum\_{i} \left( \frac{\mathbf{x}\_{\bar{i}\dots}}{st} - \frac{\mathbf{x}\_{\dots}}{rst} \right)^2 - rt \sum\_{j} \left( \frac{\mathbf{x}\_{\cdot j}}{rt} - \frac{\mathbf{x}\_{\dots}}{rst} \right)^2, \end{split}$$

the simplifications resulting from properties of summations with respect to subscripts. Thus, the sum of squares due to the hypothesis *Ho* : *γij* = 0 for all *i* and *j* or the interaction sum of squares, denoted by *s*<sup>2</sup> *<sup>γ</sup>* is the following:

$$\begin{split} s\_{\mathcal{Y}}^2 = s\_0^2 - s^2 &= \sum\_{ijk} \left( \mathbf{x}\_{ijk} - \frac{\mathbf{x}\_{\dots}}{rst} \right)^2 - st \sum\_{i} \left( \frac{\mathbf{x}\_{i\dots}}{st} - \frac{\mathbf{x}\_{\dots}}{rst} \right)^2 \\ &- rt \sum\_{j} \left( \frac{\mathbf{x}\_{.j.}}{rt} - \frac{\mathbf{x}\_{\dots}}{rst} \right)^2 - \sum\_{ijk} \left( \mathbf{x}\_{ijk} - \frac{\mathbf{x}\_{ij.}}{t} \right)^2, \end{split}$$

and since

$$\sum\_{ijk} \left( \mathbf{x}\_{ijk} - \frac{\mathbf{x}\_{ij.}}{t} \right)^2 = \sum\_{ijk} \left( \mathbf{x}\_{ijk} - \frac{\mathbf{x}\_{...}}{r \mathbf{s} t} \right)^2 - t \sum\_{ij} \left( \frac{\mathbf{x}\_{ij.}}{t} - \frac{\mathbf{x}\_{...}}{r \mathbf{s} t} \right)^2,$$

the sum of squares due to the hypothesis or attributable to the *γij* 's, that is, due to interaction is

$$s\_{\mathcal{V}}^2 = t \sum\_{ij} \left( \frac{\mathbf{x}\_{ij.}}{t} - \frac{\mathbf{x}\_{...}}{rst} \right)^2 - st \sum\_{i} \left( \frac{\mathbf{x}\_{i.}}{st} - \frac{\mathbf{x}\_{...}}{rst} \right)^2 - rt \sum\_{j} \left( \frac{\mathbf{x}\_{.j.}}{rt} - \frac{\mathbf{x}\_{...}}{rst} \right)^2. \tag{13.4.3}$$

If the hypothesis *γij* = 0 is not rejected, the effects of the *γij* 's are deemed insignificant and then, setting the hypothesis *γij* = 0*, αi* = 0*, i* = 1*,...,r*, we obtain the sum of squares due to the *αi*'s or sum of squares due to the rows denoted as *s*<sup>2</sup> *<sup>r</sup>* , is

$$s\_r^2 = \sum\_{ijk} \left( \frac{\mathbf{x}\_{i\dots}}{st} - \frac{\mathbf{x}\_{\dots}}{rst} \right)^2 = st \sum\_{i=1}^r \left( \frac{\mathbf{x}\_{i\dots}}{st} - \frac{\mathbf{x}\_{\dots}}{rst} \right)^2. \tag{13.4.4}$$

Similarly, the sum of squares attributable to the *βj* 's or due to the columns, denoted as *s*<sup>2</sup> *c* , is

$$s\_c^2 = rt\sum\_{j=1}^{s} \left(\frac{\chi\_{.j.}}{rt} - \frac{\chi\_{...}}{rst}\right)^2. \tag{13.4.5}$$

Observe that the sum of squares due to rows plus the sum of squares due to columns, once added to the interaction sum of squares, is the subtotal sum of squares, denoted by *s*2 *rc* = *t ij xij. <sup>t</sup>* <sup>−</sup> *x... rst* <sup>2</sup> or this subtotal sum of squares is partitioned into the sum of squares due to the rows, due to the columns and due to interaction. This is equivalent to an ANOVA on the subtotals *<sup>k</sup> xijk* or an ANOVA on a two-way classification with a single observation per cell. As has been pointed out, in that case, we cannot test for interaction, and moreover, this subtotal sum of squares plus the residual sum of squares is the grand total sum of squares. If we assume a normal distribution for the error terms, that is, *eijk iid* <sup>∼</sup> *<sup>N</sup>*1*(*0*, σ*2*), σ*<sup>2</sup> *<sup>&</sup>gt;* 0, for all *i, j, k*, then under the hypothesis *Ho* : *γij* <sup>=</sup> 0, it can be shown that

$$\frac{s\_\nu^2}{\sigma^2} \sim \chi\_\nu^2, \quad \nu = (rs - 1) - (r - 1) - (s - 1) = (r - 1)(s - 1), \tag{13.4.6}$$

and the residual variation *s*<sup>2</sup> has the following distribution whether *Ho* holds or not:

$$\frac{s^2}{\sigma^2} \sim \chi^2\_{\nu\_l}, \quad \nu\_l = rst - 1 - (rs - 1) = rs(t - 1), \tag{13.4.7}$$

where *s*<sup>2</sup> *<sup>γ</sup>* and *<sup>s</sup>*<sup>2</sup> are independently distributed. Then, under the hypothesis *γij* <sup>=</sup> 0 for all *i* and *j* or when this hypothesis is not rejected, it can be established that

$$\frac{s\_r^2}{\sigma^2} \sim \chi\_{r-1}^2, \ \frac{s\_c^2}{\sigma^2} \sim \chi\_{s-1}^2 \tag{13.4.8}$$

and *s*<sup>2</sup> *<sup>r</sup>* and *s*<sup>2</sup> as well as *s*<sup>2</sup> *<sup>c</sup>* and *<sup>s</sup>*<sup>2</sup> are independently distributed whenever *Ho* : *γij* <sup>=</sup> 0 is not rejected. Hence, under the hypothesis,

$$\frac{s\_\nu^2/(r-1)(s-1)}{s^2/(rs(t-1))} \sim F\_{\nu,\nu\_1}, \ \nu = (r-1)(s-1), \ \nu\_1 = rs(t-1). \tag{13.4.9}$$

The total sum of squares is *ijk xijk* <sup>−</sup> *x... rst* 2 . Thus, the first decomposition and the first part of ANOVA in this two-way classification scheme is the following:

Total variation = Variation due to the subtotals + Residual variation,

the second stage being

Variation due to the subtotals = Variation due to the rows

$$+\text{Variation due to the columns }+\text{Variation due to interaction.}$$

and the resulting ANOVA table is the following:


ANOVA Table for the Two-Way Classification

where *df* designates the number of degrees of freedom, *SS* means sum of squares, *MS* stands for mean squares, the expressions for the residual sum of squares is given in (13.4.2), that for the interaction in (13.4.3), that for the rows in (13.4.4) and that for columns in (13.4.5), respectively. Note that we test the hypothesis on the *αi*'s and *βj* 's or row effects and column effects, only if the hypothesis *γij* = 0 is not rejected; otherwise there is no point in testing hypotheses on the *αi*'s and *βj* 's because they are confounded with the *γij* 's.

#### **13.5. Multivariate Extension of the Two-Way Layout**

Instead of a single real scalar variable being studied, we consider a *p* × 1 vector of real scalar variables. The multivariate two-way classification, the fixed effect model is the following:

$$X\_{ijk} = M + A\_i + B\_j + \Gamma\_{ij} + E\_{ijk},\tag{13.5.1}$$

for *i* = 1*, . . . , r, j* = 1*,...,s ,k* = 1*,... ,t,* where *M,Ai, Bj , Γij* and *Eijk* are all *p* ×1 vectors. In this case, *M* is a general effect, *Ai* is the deviation from the general effect due to the *i*-th row, *Bj* is the deviation from the general effect due to the *j* -th column, *Γij* is the deviation from the general effect due to interaction between the rows and the columns and *Eijk* is the vector of the random or error component. For convenience, the two sets of treatments are referred to as rows and columns, the first set as rows and the second, as columns. In a two-way layout, two sets of treatments are tested. As in the scalar case of Sect. 13.4, we can assume, without any loss of generality, that *<sup>i</sup> Ai* = *A*<sup>1</sup> +···+ *Ar* = *A.* <sup>=</sup> *O, B.* <sup>=</sup> *O, <sup>r</sup> <sup>i</sup>*=<sup>1</sup> *Γij* <sup>=</sup> *Γ.j* <sup>=</sup> *<sup>O</sup>* and *<sup>s</sup> <sup>j</sup>*=<sup>1</sup> *Γij* <sup>=</sup> *Γi.* <sup>=</sup> *<sup>O</sup>*. At this juncture, the procedures are parallel to those developed in Sect. 13.4 for the real scalar variable case. Instead of sums of squares, we now have sums of squares and cross products matrices. As before, we may write *Mij* = *M* +*Ai* +*Bj* +*Γij* . Then, the trace of the sum of squares and cross products error matrix *EijkE*- *ijk* is minimized. Using the vector derivative operator, we have

$$\begin{aligned} \frac{\partial}{\partial \boldsymbol{M}\_{ij}} \text{tr} \Big[ \sum\_{ijk} E\_{ijk} E\_{ijk}' \Big] = O & \Rightarrow \sum\_k (X\_{ijk} - M\_{ij}) = O, \\ \Rightarrow \hat{\boldsymbol{M}}\_{ij} &= \frac{1}{t} X\_{ij.} \,, \end{aligned}$$

so that the residual sum of squares and cross products matrix, denoted by *Sres*, is

$$S\_{res} = \sum\_{ijk} \left( X\_{ijk} - \frac{X\_{ij.}}{t} \right) \left( X\_{ijk} - \frac{X\_{ij.}}{t} \right)'. \tag{13.5.2}$$

All other derivations are analogous to those provided in the real scalar case. The sum of squares and cross products matrix due to interaction, denoted by *Sint* is the following:

$$S\_{int} = t\sum\_{ij} \left(\frac{X\_{ij.}}{t} - \frac{X\_{...}}{rst}\right) \left(\frac{X\_{ij.}}{t} - \frac{X\_{...}}{rst}\right)'$$

$$-rt\sum\_{i} \left(\frac{X\_{i.}}{st} - \frac{X\_{...}}{rst}\right) \left(\frac{X\_{i.}}{st} - \frac{X\_{...}}{rst}\right)'$$

$$-rt\sum\_{j} \left(\frac{X\_{.j.}}{rt} - \frac{X\_{...}}{rst}\right) \left(\frac{X\_{.j.}}{rt} - \frac{X\_{...}}{rst}\right)'.\tag{13.5.3}$$

The sum of squares and cross products matrices due to the rows and columns are respectively given by

$$S\_{row} = st\sum\_{i=1}^{r} \left(\frac{X\_{i\dots}}{st} - \frac{X\_{\dots}}{rst}\right) \left(\frac{X\_{i\dots}}{st} - \frac{X\_{\dots}}{rst}\right)',\tag{13.5.4}$$

$$S\_{col} = rt\sum\_{j=1}^{s} \left(\frac{X\_{.j.}}{rt} - \frac{X\_{...}}{rst}\right) \left(\frac{X\_{.j.}}{rt} - \frac{X\_{...}}{rst}\right)'.\tag{13.5.5}$$

The sum of squares and cross products matrix for the subtotal is denoted by *Ssub* = *Srow* + *Scol* + *Sint* . The total sum of squares and cross products matrix, denoted by *Stot* , is the following:

$$S\_{tot} = \sum\_{ijk} \left( X\_{ijk} - \frac{X\_{\dots}}{rst} \right) \left( X\_{ijk} - \frac{X\_{\dots}}{rst} \right)'. \tag{13.5.6}$$

We may now construct the MANOVA table. The following abbreviations are used: df stands for degrees of freedom of the corresponding Wishart matrix, SSP means the sum of squares and cross products matrix, MS stands for mean squares and is equal to SSP/df, and *Srow*, *Scol*, *Sres* and *Stot* are respectively specified in (13.5.4), (13.5.5), (13.5.2) and (13.5.6).


MANOVA Table for a Two-Way Layout

#### **13.5.1. Likelihood ratio test for multivariate two-way layout**

Under the assumption that the error or random components *Eijk iid* ∼ *Np(O, Σ), Σ > <sup>O</sup>* for all *i, j* and *k,* the exponential part of the multivariate normal density excluding <sup>−</sup><sup>1</sup> 2 is obtained as follows:

$$\begin{split} E'\_{ijk} E\_{ijk} &= (X\_{ijk} - M - A\_i - B\_j - \Gamma\_{ij})' \Sigma^{-1} (X\_{ijk} - M - A\_i - B\_j - \Gamma\_{ij}) \\ &= \text{tr} \big[ \Sigma^{-1} (X\_{ijk} - M - A\_i - B\_j - \Gamma\_{ij}) (X\_{ijk} - M - A\_i - B\_j - \Gamma\_{ij})' \big] \Rightarrow \\ \sum\_{ijk} E'\_{ijk} E\_{ijk} &= \text{tr} \Big[ \Sigma^{-1} \Big[ \sum\_{ijk} (X\_{ijk} - M - A\_i - B\_j - \Gamma\_{ij}) (X\_{ijk} - M - A\_i - B\_j - \Gamma\_{ij})' \Big] \Big] \Rightarrow \end{split}$$

Thus, the joint density of all the *Xijk*'s, denoted by *L*, is

$$\begin{split} L &= \frac{1}{(2\pi)^{\frac{prst}{2}} |\boldsymbol{\Sigma}|^{\frac{rst}{2}}} \\ &\times \mathbf{e}^{-\frac{1}{2} \text{tr} \{ \boldsymbol{\Sigma}^{-1} \sum\_{ijk} (X\_{ijk} - \boldsymbol{M} - \boldsymbol{A}\_{i} - \boldsymbol{B}\_{j} - \boldsymbol{\Gamma}\_{ij})(X\_{ijk} - \boldsymbol{M} - \boldsymbol{A}\_{i} - \boldsymbol{B}\_{j} - \boldsymbol{\Gamma}\_{ij})'|} \end{split}$$

The maximum likelihood estimates of *M,Ai, Bj* and *Γij* are the same as the least squares estimates and hence, the maximum likelihood estimator (MLE) of *Σ* is the least squares minimum which is the residual sum of squares and cross products matrix *S* or *Sres* (in the present notation), where

$$S\_{res} = \sum\_{ijk} (X\_{ijk} - \hat{M} - \hat{A}\_i - \hat{B}\_j - \hat{\Gamma}\_{ij})(X\_{ijk} - \hat{M} - \hat{A}\_i - \hat{B}\_j - \hat{\Gamma}\_{ij})'$$

$$= \sum\_{ijk} \left(X\_{ijk} - \frac{X\_{ij.}}{t}\right) \left(X\_{ijk} - \frac{X\_{ij.}}{t}\right)'.\tag{13.5.7}$$

This is the sample sum of squares and the cross products matrix under the general model and its determinant raised to the power of *rst* <sup>2</sup> is the quantity appearing in the numerator of the likelihood ratio criterion *λ*. Consider the hypothesis *Ho* : *Γij* = *O* for all *i* and *j* . Then, under this hypothesis, the estimator of *Σ* is *S*0, where

$$S\_0 = \sum\_{ijk} \left[ \left( X\_{ijk} - \frac{X\_{\dots}}{rst} \right) - \left( \frac{X\_{i\dots}}{st} - \frac{X\_{\dots}}{rst} \right) - \left( \frac{X\_{\cdot j\dots}}{rt} - \frac{X\_{\dots}}{rst} \right) \right]$$

$$\times \left[ \left( X\_{ijk} - \frac{X\_{\dots}}{rst} \right) - \left( \frac{X\_{i\dots}}{st} - \frac{X\_{\dots}}{rst} \right) - \left( \frac{X\_{\cdot j\dots}}{rt} - \frac{X\_{\dots}}{rst} \right) \right]'$$

and |*S*0| *rst* <sup>2</sup> is the quantity appearing in the denominator of *λ*. However, *S*<sup>0</sup> − *Sres* = *Sint* is the sum of squares and cross products matrix due to the interaction terms *Γij* 's or to the hypothesis, so that *S*<sup>0</sup> = *Sres* + *Sint* . Therefore, *λ* is given by

$$\lambda = \frac{|S\_{res}|^{\frac{rst}{2}}}{|S\_{res} + S\_{int}|^{\frac{rst}{2}}} \tag{13.5.8}$$

Letting *w* = *λ* 2 *rst* ,

$$w = \frac{|S\_{res}|}{|S\_{res} + S\_{int}|}.\tag{13.5.9}$$

It follows from results derived in Chap. 5 that *Sres* ∼ *Wp(rs(t* − 1*), Σ), Sint* ∼ *Wp((r* − 1*)(s* − 1*), Σ)* under the hypothesis and *Sres* and *Sint* are independently distributed and hence, under *Ho*,

*W* = *(Sres* + *Sint)* −1 <sup>2</sup> *Sres(Sres* + *Sint)* −1 <sup>2</sup> ∼ real *p*-variate type-1 beta random variable with the parameters *(rs(t*−1*)* <sup>2</sup> *, (r*−1*)(s*−1*)* <sup>2</sup> *)*. As well,

$$W\_1 = S\_{res}^{-\frac{1}{2}} S\_{int} S\_{res}^{-\frac{1}{2}} \sim \text{ real p-variance type-2 beta random variable}$$

with the parameters *((r*−1*)(s*−1*)* <sup>2</sup> *, rs(t*−1*)* <sup>2</sup> *)*. Under *Ho*, the *h*-th arbitrary moments of *w* and *λ*, which are readily obtained from those of a real matrix-variate type-1 beta variable, are

$$E[w]^h = \left\{ \prod\_{j=1}^p \frac{\Gamma(\upsilon\_1 + \upsilon\_2 - \frac{j-1}{2})}{\Gamma(\upsilon\_1 - \frac{j-1}{2})} \right\} \left\{ \prod\_{j=1}^p \frac{\Gamma(\upsilon\_1 + h - \frac{j-1}{2})}{\Gamma(\upsilon\_1 + \upsilon\_2 + h - \frac{j-1}{2})} \right\} \tag{13.5.10}$$

$$E[\lambda]^h = \left\{ \prod\_{j=1}^p \frac{\Gamma(\upsilon\_1 + \upsilon\_2 - \frac{j-1}{2})}{\Gamma(\upsilon\_1 - \frac{j-1}{2})} \right\} \left\{ \prod\_{j=1}^p \frac{\Gamma(\upsilon\_1 + \frac{rst}{2}h - \frac{j-1}{2})}{\Gamma(\upsilon\_1 + \upsilon\_2 + \frac{rst}{2}h - \frac{j-1}{2})} \right\} \tag{13.5.11}$$

where *<sup>ν</sup>*<sup>1</sup> <sup>=</sup> *rs(t*−1*)* <sup>2</sup> and *<sup>ν</sup>*<sup>2</sup> <sup>=</sup> *(r*−1*)(s*−1*)* <sup>2</sup> . Note that we reject the null hypothesis *Ho* : *Γij* = *O*, *i* = 1*, . . . , r, j* = 1*,... , s,* for small values of *w* and *λ*. As explained in Sect. 13.3, the exact general density of *w* in (13.5.10) can be expressed in terms of a G-function and the exact general density of *λ* in (13.5.11) can be written in terms of a H-function. For the theory and applications of the G-function and the H-function, the reader may respectively refer to Mathai (1993) and Mathai et al. (2010).

#### **13.5.2. Asymptotic distribution of** *λ* **in the MANOVA two-way layout**

Consider the arbitrary *h*-th moment specified in (13.5.11). On expanding all the gamma functions for large values of *rst* in the constant part and for large values of *rst(*1 + *h)* in the functional part by applying Stirling's formula or using the first term in the asymptotic expansion of a gamma function referring to (13.3.13), it can be verified that the *h*-th moment of *λ* behaves asymptotically as follows:

$$\lambda^h \to (1+h)^{-\frac{p(r-1)(s-1)}{2}} \Rightarrow -2\ln\lambda \to \chi^2\_{p(r-1)(s-1)} \text{ as } rst \to \infty. \tag{13.5.12}$$

Thus, for large values of *rst*, one can utilize this real scalar chisquare approximation for testing the hypothesis *Ho* : *Γij* = *O* for all *i* and *j* . We can work out a large number of exact distributions of *w* of (13.5.10) for special values of *r, s, t, p*. Observe that

$$E[w^h] = C \prod\_{j=1}^p \frac{\Gamma(\frac{rs(t-1)}{2} - \frac{j-1}{2} + h)}{\Gamma(\frac{rs(t-1)}{2} + \frac{(r-1)(s-1)}{2} - \frac{j-1}{2} + h)}\tag{13.5.13}$$

where *<sup>C</sup>* is the normalizing constant such that when *<sup>h</sup>* <sup>=</sup> <sup>0</sup>*, E*[*wh*] = 1. Thus, when *(r* − 1*)(s* − 1*)* is a positive integer or when *r* or *s* is odd, the gamma functions cancel out, leaving a number of factors in the denominator which can be written as a sum by applying the partial fractions technique. For small values of *p,* the exact density will then be expressible as a sum involving only a few terms. For larger values of *p*, there will be repeated factors in the denominator, which complicates matters.

#### **13.5.3. Exact densities of** *w* **in some special cases**

We will consider several special cases of the *h*-th moment of *w* as given in (13.5.13). **Case (1):** *p* = 1**.** In this case, *h*-th moment becomes

$$E[w^h] = C\_1 \frac{\Gamma(\frac{rs(t-1)}{2} + h)}{\Gamma(\frac{rs(t-1)}{2} + \frac{(r-1)(s-1)}{2} + h)}$$

where *C*<sup>1</sup> is the associated normalizing constant. This is the *h*-th moment of a real scalar type-1 beta random variable with the parameters *(rs(t*−1*)* <sup>2</sup> *, (r*−1*)(s*−1*)* <sup>2</sup> *)*. Hence *<sup>y</sup>* <sup>=</sup> <sup>1</sup>−*<sup>w</sup> <sup>w</sup>* is a real scalar type-2 beta with parameters *((r*−1*)(s*−1*)* <sup>2</sup> *, rs(t*−1*)* <sup>2</sup> *)*, and

$$\frac{rs(t-1)}{(r-1)(s-1)} \text{ y } \sim F\_{(r-1)(s-1),rs(t-1)} \cdot$$

Accordingly, the test can be carried out by using this *F*-statistic. One would reject the null hypothesis *Ho* : *Γij* = *O* if the observed *F* ≥ *F(r*−1*)(s*−1*),rs(t*−1*),α* where *F(r*−1*)(s*−1*),rs(t*−1*),α* is the upper 100 *α*% percentile of this *F*-density. For example, for *r* = 2*, s* = 3*, t* = 3 and *α* = 0*.*05*,* we have *F*2*,*12*,*0*.*<sup>05</sup> = 19*.*4 from F-tables so that *Ho* would be rejected if the observed value of *F*2*,*<sup>12</sup> ≥ 19*.*4 at the specified significance level.

**Case (2):** *<sup>p</sup>* <sup>=</sup> <sup>2</sup>**.** In this case, we have a ratio of two gamma functions differing by <sup>1</sup> 2 . Combining the gamma functions in the numerator and in the denominator by using the duplication formula and proceeding as in Sect. 13.3 for the one-way layout, the statistic *<sup>t</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup>−√*<sup>w</sup>* <sup>√</sup>*<sup>w</sup>* , and we have

$$\frac{rs(t-1)}{(r-1)(s-1)} \, t\_1 \sim F\_{2(r-1)(s-1), 2(rs(t-1)-1)} \star$$

so that the decision can be made as in Case (1).

**Case (3):** *(r* − 1*)(s* − 1*)* = 1 ⇒ *r* = 2*, s* = 2*.* In this case, all the gamma functions in (13.3.13) cancel out except the last one in the numerator and the first one in the denominator. This gamma ratio is that of a real scalar type-1 beta random variable with the parameters *(rs(t*−1*)*+1−*<sup>p</sup>* <sup>2</sup> *, <sup>p</sup>* <sup>2</sup> *)*, and hence *<sup>y</sup>* <sup>=</sup> <sup>1</sup>−*<sup>w</sup> <sup>w</sup>* is a real scalar type-2 beta so that

$$\frac{rs(t-1)+1-p}{p} \text{ y } \sim F\_{p,rs(t-1)+1-p},$$

and decision can be made by making use of this *F* distribution as in Case (1).

**Case(4):** *(r* − 1*)(s* − 1*)* = 2**.** In this case,

$$E[w^h] = C\_1 \prod\_{j=1}^p \frac{1}{\frac{rs(t-1)}{2} - \frac{j-1}{2} + h}$$

with the corresponding normalizing constant *C*1. This product of *p* factors can be expressed as a sum by using partial fractions. That is,

$$E[w^h] = C\_1 \sum\_{j=0}^{p-1} \frac{b\_j}{a + h - \frac{j}{2}} \tag{i}$$

where

$$b\_j = \lim\_{a+h \to \frac{j}{2}} [(a+h)(a+h-\frac{1}{2}) \cdots (a+h-\frac{j-1}{2})(a+h-\frac{j+1}{2}) \cdots (a+h-\frac{p-1}{2})],\tag{ii}$$

$$a = \frac{rs(t-1)}{2}.$$

Thus, the density of *w*, denoted by *fw(w)*, which is available from *(i)* and *(ii)*, is the following:

$$f\_w(w) = C\_1 \sum\_{j=0}^{p-1} b\_j w^{a - \frac{j}{2} - 1}, \ 0 \le w \le 1,$$

and zero elsewhere. Some additional special cases could be worked out but the expressions would become complicated. For large values of *rst*, one can apply the asymptotic chisquare result given in (13.5.12) for testing the hypothesis *Ho* : *Γij* = *O*.

**Example 13.5.1.** An experiment is conducted among heart patients to stabilize their systolic pressure, diastolic pressure and heart rate or pulse around the standard numbers which are 120, 80 and 60, respectively. A random sample of 24 patients who may be considered homogeneous with respect to all factors of variation, such as age, weight group, race, gender, dietary habits, and so on, are selected. These 24 individuals are randomly divided into two groups of equal size. One group of 12 subjects are given the medication combination Med-1 and the other 12 are administered the medication combination Med-2. Then, the Med-1 group is randomly divided into three subgroups of 4 subjects. These subgroups are assigned exercise routines Ex-1, Ex-2, Ex-3. Similarly, the Med-2 group is also divided at random into 3 subgroups of 4 individuals who are respectively subjected to exercise routines Ex-1, Ex-2, Ex-3. After one week, the following observations are made *x*<sup>1</sup> = current reading on systolic pressure minus 120, *x*<sup>2</sup> = current reading on diastolic pressure minus 80, *x*<sup>3</sup> = current reading on heart rate minus 60. The structure of the two-way data layout is as follows:


Let *Xijk* be the *k*-th vector in the *i*-the row (*i*-th medication) and *j* -th column (*j* -th exercise routine). For convenience, the data are presented in matrix form:

$$\begin{aligned} A\_{11} &= [X\_{111}, X\_{112}, X\_{113}, X\_{114}], & A\_{12} &= [X\_{121}, X\_{122}, X\_{123}, X\_{124}], \\ A\_{13} &= [X\_{131}, X\_{132}, X\_{133}, X\_{134}], & A\_{21} &= [A\_{211}, A\_{212}, A\_{213}, A\_{214}], \\ A\_{22} &= [A\_{221}, A\_{222}, A\_{223}, A\_{224}], & A\_{23} &= [X\_{231}, X\_{232}, X\_{233}, X\_{234}]; \end{aligned}$$

$$\begin{array}{ccccccccccccc} & -2 & 3 & 2 & 5 & & 1 & 4 & -1 & 4 & & 4 & -3 & 3 & 4\\ A\_{11} = & 1 & -1 & 1 & -1 & , & A\_{12} = & -2 & -2 & -3 & 3 & , & A\_{13} = & 2 & -3 & 2 & 3\\ & 2 & -1 & -1 & 0 & & & 3 & -2 & -1 & 0 & & & 1 & -1 & 1 & -1\\ & & & 2 & 0 & -2 & 0 & & & 3 & -1 & -1 & 3 & & & -2 & -1 & 0 & -1\\ A\_{21} = & 1 & 4 & 1 & 2 & , & A\_{22} = & 4 & 4 & 0 & 0 & , & A\_{23} = & 1 & 4 & 0 & 3 & . \\ & & & 2 & 1 & -1 & -2 & & & 0 & 1 & -1 & 4 & & -2 & 0 & -2 & 0 \end{array}$$

(1) Perform a two-way ANOVA on the first component, namely, *x*1, the current reading minus 120; (2) Carry out a MANOVA on the full data.

**Solution 13.5.1.** We need the following quantities:

$$\begin{aligned} \begin{bmatrix} X\_{11.} = \begin{bmatrix} 8 \\ 0 \\ 0 \end{bmatrix}, \ X\_{12.} = \begin{bmatrix} 8 \\ -4 \\ 0 \end{bmatrix}, \ X\_{13.} = \begin{bmatrix} 8 \\ 4 \\ 0 \end{bmatrix} \Rightarrow X\_{1...} = \begin{bmatrix} 24 \\ 0 \\ 0 \end{bmatrix} \\\ X\_{21.} = \begin{bmatrix} 0 \\ 8 \\ 0 \end{bmatrix}, \ X\_{22.} = \begin{bmatrix} 4 \\ 8 \\ 4 \end{bmatrix}, \ X\_{23.} = \begin{bmatrix} -4 \\ 8 \\ -4 \end{bmatrix} \Rightarrow X\_{2...} = \begin{bmatrix} 0 \\ 24 \\ 0 \end{bmatrix} \\\ X\_{1.} = \begin{bmatrix} 8 \\ 8 \\ 0 \end{bmatrix}, \ X\_{2.} = \begin{bmatrix} 12 \\ 4 \\ 4 \end{bmatrix}, \ X\_{3.} = \begin{bmatrix} 4 \\ 12 \\ -4 \end{bmatrix} \Rightarrow X\_{...} = \begin{bmatrix} 24 \\ 24 \\ 0 \end{bmatrix} \\\ \bar{X} = \frac{X\_{...}}{rst} = \frac{1}{24} \begin{bmatrix} 24 \\ 24 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}, \ r = 2, \ s = 3, \ t = 4. \end{aligned}$$

By using the first elements in all these vectors, we will carry out a two-way ANOVA and answer the first question. Since these are all observations on real scalar variables, we will utilize lower-case letters to indicate scalar quantities. Thus, we have the following values:

$$\begin{split} s\_{tot}^2 &= \sum\_{ij} (x\_{ijk} - \bar{x})^2 = (-2 - 1)^2 + (3 - 1)^2 + \dots + (-1 - 1)^2 = 136, \\ s\_{row}^2 &= 12 \sum\_{i=1}^2 \left(\frac{x\_{i.}}{12} - 1\right)^2 = 12[(2 - 1)^2 + (0 - 1)^2] = 24, \\ s\_{col}^2 &= 8 \sum\_{j=1}^3 \left(\frac{x\_{.j.}}{8} - 1\right)^2 = 8\left[(1 - 1)^2 + \left(\frac{3}{2} - 1\right)^2 + \left(\frac{1}{2} - 1\right)^2\right] = 4, \\ s\_{int}^2 &= 4 \sum\_{ij} \left(\frac{x\_{ij.}}{4} - \frac{x\_{i.}}{12} - \frac{x\_{.j.}}{8} + 1\right)^2 \\ &= 4\left[\left(\frac{8}{4} - \frac{24}{12} - \frac{8}{8} + 1\right)^2 + \dots + \left(-\frac{4}{4} - 0 - \frac{4}{8} + 1\right)^2\right] = 4, \\ s\_{sub}^2 &= 4\sum\_{ij} \left(\frac{x\_{ij.}}{4} - \frac{x\_{..}}{24}\right)^2 = 4\left[\left(\frac{8}{4} - 1\right)^2 + \dots + \left(-\frac{4}{4} - 1\right)\right] = 32, \end{split}$$

$$\begin{aligned} s\_{res}^2 &= \sum\_{ijk} \left( \mathbf{x}\_{ijk} - \frac{\mathbf{x}\_{ij}}{4} \right)^2 = (-2 - 2)^2 + \dots + (-1 + 1)^2 = 104, \\ s\_{tot}^2 &= \sum\_{ijk} \left( \mathbf{x}\_{ijk} - \frac{\mathbf{x}\_{\cdots}}{24} \right)^2 = (-2 - 1)^2 + (3 - 1)^2 + \dots + (-1 - 1)^2 = 136. \end{aligned}$$

All quantities have been calculated separately in order to verify the computations. We could have obtained the interaction sum of squares from the subtotal sum of squares minus the sum of squares due to rows and columns. Similarly, we could have obtained the residual sum of squares from the total sum of squares minus the subtotal sum of squares. We will set up the ANOVA table, where, as usual, *df* stands for degrees of freedom, *SS* means sum of squares and *MS* denotes mean squares:

#### Multivariate Analysis of Variation


### ANOVA Table for a Two-Way Layout with Interaction

For testing the hypothesis of no interaction, the *F*-value at the 5% significance level is *<sup>F</sup>*2*,*18*,*0*.*<sup>05</sup> <sup>≈</sup> 19. The observed value of this *<sup>F</sup>*2*,*<sup>18</sup> being <sup>2</sup> <sup>5</sup>*.*<sup>78</sup> ≈ 0*.*35 *<* 19, the hypothesis of no interaction is not rejected. Thus, we can test for the significance of the row and column effects. Consider the hypothesis *α*<sup>1</sup> = *α*<sup>2</sup> = 0. Then under this hypothesis and no interaction hypothesis, the *F*-ratio for the row sum of squares is 24*/*5*.*78 ≈ 4*.*15 *<* 240 = *F*1*,*18*,*0*.*05, the tabulated value of *F*1*,*<sup>18</sup> at *α* = 0*.*05. Therefore, this hypothesis is not rejected. Now, consider the hypothesis *β*<sup>1</sup> = *β*<sup>2</sup> = *β*<sup>3</sup> = 0. Since under this hypothesis and the hypothesis of no interaction, the *F*-ratio for the column sum of squares is <sup>2</sup> <sup>5</sup>*.*<sup>78</sup> = 0*.*35 *<* 19 = *F*2*,*18*,*0*.*05, it is not rejected either. Thus, the data show no significant interaction between exercise routine and medication, and no significant effect of the exercise routines or the two combinations of medications in bringing the systolic pressures closer to the standard value of 120.

We now carry out the computations needed to perform a MANOVA on the full data. We employ our standard notation by denoting vectors and matrices by capital letters. The sum of squares and cross products matrices for the rows and columns are the following, respectively denoted by *Srow* and *Scol* :

$$\begin{split} S\_{row} &= st\sum\_{i=1}^{2} \left( \frac{X\_{i..}}{st} - \frac{X\_{..}}{rst} \right) \left( \frac{X\_{i..}}{st} - \frac{X\_{..}}{rst} \right)' \\ &= 12 \left\{ \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} [1, -1, 0] + \begin{bmatrix} -1 \\ 1 \\ 0 \end{bmatrix} [-1, 1, 0] \right\} = 12 \begin{bmatrix} 24 & -24 & 0 \\ -24 & 24 & 0 \\ 0 & 0 & 0 \end{bmatrix}, \end{split}$$

$$\begin{split} S\_{col} &= rt\sum\_{j=1}^{3} \left( \frac{X\_{.j.}}{rt} - \frac{X\_{..}}{rst} \right) \left( \frac{X\_{.j.}}{rt} - \frac{X\_{..}}{rst} \right)' \\ &= 8\left\{ O + \frac{1}{4} \begin{bmatrix} 1 & -1 & 1 \\ -1 & 1 & -1 \\ 1 & -1 & 1 \end{bmatrix} + \frac{1}{4} \begin{bmatrix} 1 & -1 & 1 \\ -1 & 1 & -1 \\ 1 & -1 & 1 \end{bmatrix} \right\} = \begin{bmatrix} 4 & -4 & 4 \\ -4 & 4 & -4 \\ 4 & -4 & 4 \end{bmatrix}, \end{split}$$

$$\begin{split} S\_{int} &= t \sum\_{ij} \left( \frac{X\_{ij.}}{t} - \frac{X\_{i.}}{st} - \frac{X\_{.j.}}{rt} + \frac{X\_{..}}{rst} \right) \left( \frac{X\_{ij.}}{t} - \frac{X\_{i.}}{st} - \frac{X\_{.j.}}{rt} + \frac{X\_{..}}{rst} \right)^{\prime} \\ &= 4 \left\{ O + \frac{1}{4} \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} + \frac{1}{4} \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} + O \\ &+ \frac{1}{4} \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} + \frac{1}{4} \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} \right\} = \begin{bmatrix} 4 & 4 & 4 \\ 4 & 4 & 4 \\ 4 & 4 & 4 \end{bmatrix}, \end{split}$$

$$\begin{split} S\_{sub} &= t \sum\_{ij} \left( \frac{X\_{ij.}}{t} - \bar{X} \right) \left( \frac{X\_{ij.}}{t} - \bar{X} \right)' \\ &= \begin{bmatrix} 1 & -1 & 0 \\ -1 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix} + \dots + \begin{bmatrix} 4 & -2 & 2 \\ -2 & 1 & -1 \\ 2 & -1 & 1 \end{bmatrix} = \begin{bmatrix} 32 & -24 & 8 \\ -24 & 32 & 0 \\ 8 & 0 & 8 \end{bmatrix}. \end{split}$$

We can verify the computations done so far as follows. The sum of squares and cross product matrices ought to be such that *Srow* + *Scol* + *Sint* = *Ssub*. These are

$$\begin{aligned} S\_{row} + S\_{col} + S\_{int} &= \begin{bmatrix} 24 & -24 & 0 \\ -24 & 24 & 0 \\ 0 & 0 & 0 \end{bmatrix} + \begin{bmatrix} 4 & -4 & 4 \\ -4 & 4 & -4 \\ 4 & -4 & 4 \end{bmatrix} + \begin{bmatrix} 4 & 4 & 4 \\ 4 & 4 & 4 \\ 4 & 4 & 4 \end{bmatrix} \\ &= \begin{bmatrix} 32 & -24 & 8 \\ -24 & 32 & 0 \\ 8 & 0 & 8 \end{bmatrix} = S\_{sub} \cdot \end{aligned}$$

Hence the result is verified. Now, the total and residual sums of squares and cross product matrices are

$$\begin{aligned} S\_{tot} &= \sum\_{ijk} (\mathbf{X}\_{ijk} - \bar{\mathbf{X}})(\mathbf{X}\_{ijk} - \bar{\mathbf{X}})' \\ &= \begin{bmatrix} 9 & 0 & -6 \\ 0 & 0 & 0 \\ -6 & 0 & 4 \end{bmatrix} + \dots + \begin{bmatrix} 4 & -4 & 0 \\ -4 & 4 & 0 \\ 0 & 0 & 0 \end{bmatrix} = \begin{bmatrix} 136 & 11 & 16 \\ 11 & 112 & 6 \\ 16 & 6 & 60 \end{bmatrix} \end{aligned}$$

and

$$\begin{split} S\_{res} &= \sum\_{ijk} \left( X\_{ijk} - \frac{X\_{ij.}}{t} \right) \left( X\_{ijk} - \frac{X\_{ij.}}{t} \right)' \\ &= \begin{bmatrix} 16 & -14 & -8 \\ -4 & 1 & 2 \\ -8 & 2 & 4 \end{bmatrix} + \dots + \begin{bmatrix} 2 & 3 & 0 \\ 3 & 10 & 2 \\ 0 & 2 & 4 \end{bmatrix} = \begin{bmatrix} 104 & 35 & 8 \\ 35 & 80 & 6 \\ 8 & 6 & 52 \end{bmatrix}. \end{split}$$

Then,

$$S\_{res} + S\_{int} = \begin{bmatrix} 104 & 35 & 8 \\ 35 & 80 & 6 \\ 8 & 6 & 52 \end{bmatrix} + \begin{bmatrix} 4 & 4 & 4 \\ 4 & 4 & 4 \\ 4 & 4 & 4 \end{bmatrix} = \begin{bmatrix} 108 & 39 & 12 \\ 39 & 84 & 10 \\ 12 & 10 & 56 \end{bmatrix}.$$

The above results are included in the following MANOVA table where *df* means degrees of freedom, *SSP* denotes a sum of squares and cross products matrix and *MS* is equal to *SSP* divided by the corresponding degrees of freedom:

MANOVA Table for a Two-Way Layout with Interaction


Then, the *λ*-criterion is

$$\lambda = \frac{|S\_{res}|^{\frac{rst}{2}}}{|S\_{res} + S\_{int}|^{\frac{rs}{2}}} \implies w = \frac{|S\_{res}|}{|S\_{res} + S\_{int}|}.$$

The determinants are as follows:

$$|S\_{res}| = 104 \begin{vmatrix} 80 & 6 \\ 6 & 52 \end{vmatrix} - 35 \begin{vmatrix} 35 & 6 \\ 8 & 52 \end{vmatrix} + 8 \begin{vmatrix} 35 & 80 \\ 8 & 6 \end{vmatrix} = 363436$$

$$|S\_{res} + S\_{int}| = 108 \begin{vmatrix} 84 & 10 \\ 10 & 56 \end{vmatrix} - 39 \begin{vmatrix} 39 & 10 \\ 12 & 56 \end{vmatrix} + 12 \begin{vmatrix} 39 & 84 \\ 12 & 10 \end{vmatrix} = 409320.$$

Therefore,

$$w = \frac{363436}{409320} = 0.888 \Rightarrow \ln w = -0.118783 \Rightarrow -2\ln \lambda = 24(0.118783) = 2.8508.$$

We have explicit simple representations of the exact densities for the special cases *p* = 1*, p* = 2*, t* = 2*, t* = 3. However, our situation being *p* = 3*, t* = 4, they do not apply. A chisquare approximation is available for large values of *rst*, but our *rst* is only equal to 24. In this instance, <sup>−</sup>2 ln *<sup>λ</sup>* <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *p(r*−1*)(s*−1*) <sup>χ</sup>*<sup>2</sup> <sup>6</sup> as *rst* → ∞. However, since the observed value of −2 ln *λ* = 2*.*8508 happens to be much smaller than the critical value resulting from the asymptotic distribution, which is *χ*<sup>2</sup> <sup>6</sup>*,*0*.*<sup>05</sup> = 12*.*59 in this case, we can still safely decide not to reject the hypothesis *Ho* : *Γij* = *O* for all *i* and *j* , and go ahead and test for the main row and column effects, that is, the main effects of medical combinations Med-1 and Med-2 and the main effects of exercise routines Ex-1, Ex-2 and Ex-3. For testing the row effect, our hypothesis is *A*<sup>1</sup> = *A*<sup>2</sup> = *O* and for testing the column effect, it is *B*<sup>1</sup> = *B*<sup>2</sup> = *B*<sup>3</sup> = *O*, given that *Γij* = *O* for all *i* and *j* . The corresponding likelihood ratio criteria are respectively,

$$
\lambda\_1 = \left(\frac{|S\_{res}|}{|S\_{res} + S\_{row}|}\right)^{\frac{rst}{2}} \text{ and } \lambda\_2 = \left(\frac{|S\_{res}|}{|S\_{res} + S\_{col}|}\right)^{\frac{rst}{2}},
$$

and we may utilize *wj* = *λ* 2 *rst <sup>j</sup> , j* = 1*,* 2. From previous calculations, we have

$$\begin{aligned} S\_{res} + S\_{row} &= \begin{bmatrix} 104 & 35 & 8 \\ 35 & 80 & 6 \\ 8 & 6 & 52 \end{bmatrix} + \begin{bmatrix} 24 & -24 & 0 \\ -24 & 24 & 0 \\ 0 & 0 & 0 \end{bmatrix} = \begin{bmatrix} 128 & 11 & 8 \\ 11 & 104 & 6 \\ 8 & 6 & 52 \end{bmatrix}, \\\ S\_{res} + S\_{col} &= \begin{bmatrix} 104 & 35 & 8 \\ 35 & 80 & 6 \\ 8 & 6 & 52 \end{bmatrix} + \begin{bmatrix} 4 & -4 & 4 \\ -4 & 4 & -4 \\ 4 & -4 & 4 \end{bmatrix} = \begin{bmatrix} 108 & 31 & 12 \\ 31 & 84 & 2 \\ 12 & 2 & 56 \end{bmatrix}. \end{aligned}$$

The required determinants are as follows:

$$\begin{aligned} |S\_{res}| &= 363436, \ |S\_{res} + S\_{row}| = 675724, \ |S\_{res} + S\_{col}| = 443176 \Rightarrow\\ w\_1 &= \frac{363436}{675724} = 0.5378468 \text{ and } w\_2 = \frac{363436}{443176} = 0.8200714, \\ -2\ln\lambda\_1 &= -24\ln 0.5378468 = 24(0.62018) = 14.88, \\ \chi^2\_{p(r-1), \alpha} &= \chi^2\_{3, 0.05} = 7.81 < 14.88; \\ -2\ln\lambda\_2 &= -24\ln 0.8200714 = 24(0.19836) = 4.76, \\ \chi^2\_{p(s-1), \alpha} &= \chi^2\_{6, 0.05} = 12.59 > 4.76. \end{aligned} \tag{i)$$

When *rst* → ∞*,* <sup>−</sup>2 ln *<sup>λ</sup>*<sup>1</sup> <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *p(r*−1*)* and <sup>−</sup>2 ln *<sup>λ</sup>*<sup>2</sup> <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *p(s*−1*)* , referring to Exercises 13.5.9 and 13.5.10, respectively. These results follow from the asymptotic expansion provided in Sect. 12.5.2. Even though *rst* = 24 is not that large, we may use these chisquare approximations for making decisions as the exact densities of *w*<sup>1</sup> and *w*<sup>2</sup> do not fall into the special cases previously discussed. When making use of the likelihood ratio criterion, we reject the hypotheses *A*<sup>1</sup> = *A*<sup>2</sup> = *O* and *B*<sup>1</sup> = *B*<sup>2</sup> = *B*<sup>3</sup> = *O* for small values of *λ*<sup>1</sup> and *λ*2, respectively, which translates into large values of the approximate chisquare values. It is seen from *(i)* that the observed value −2 ln *λ*<sup>1</sup> is larger than the tabulated critical value and hence we reject the hypothesis *A*<sup>1</sup> = *A*<sup>2</sup> = *O* at the 5% level. However, the hypothesis *B*<sup>1</sup> = *B*<sup>2</sup> = *B*<sup>3</sup> = *O* is not rejected since the observed value is less than the critical value. We may conclude that the present data does not show any evidence of interaction between the exercise routines and medication combinations, that the exercise routine does not contribute significantly to bringing the subjects' initial readings closer to the standard values *(*120*,* 80*,* 60*)*, whereas there is a possibility that the medical combinations Med-1 and Med-2 are effective in significantly causing the subjects' initial readings to approach standard values.

**Note 13.5.1.** It may be noticed from the MANOVA table that the second stage analysis will involve one observation per cell in a two-way layout, that is, the *(i, j )*-th cell will contain only one observation vector *Xij.* for the second stage analysis. Thus, *Ssub* = *Sint* +*Srow* +*Scol* (the corresponding sum of squares in the real scalar case), and in this analysis with a single observation per cell, *Sint* acts as the residual sum of squares and cross products matrix (the residual sum of squares in the real scalar case). Accordingly, "interaction" cannot be tested when there is only a single observation per cell.

#### **Exercises**

**13.1.** In the ANOVA table obtained in Example 13.5.1, prove that (1) the sum of squares due to interaction and the residual sum of squares, (2) the sum of squares due to rows and residual sum of squares, (3) the sum of squares due to columns and residual sum of squares, are independently distributed under the normality assumption for the error variables, that is, *eijk iid* <sup>∼</sup> *<sup>N</sup>*1*(*0*, σ*2*), σ*<sup>2</sup> *<sup>&</sup>gt;* 0.

**13.2.** In the MANOVA table obtained in Example 13.5.1, prove that (1) *Sint* and *Sres*, (2) *Srow* and *Sres*, (3) *Scol* and *Sres*, are independently distributed Wishart matrices when *Eijk iid* ∼ *Np(O, Σ), Σ > O*.

**13.3.** In a one-way layout, the following are the data on four treatments. (1) Carry out a complete ANOVA on the first component (including individual comparisons if the hypothesis of no interaction is not rejected). (2) Perform a full MANOVA on the full data.

$$
\begin{aligned}
\text{Treatment-1} \quad \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \begin{bmatrix} -1 \\ 1 \end{bmatrix}, \begin{bmatrix} 2 \\ 3 \end{bmatrix}, \begin{bmatrix} 2 \\ 2 \end{bmatrix}, \text{Treatment-2} \quad \begin{bmatrix} 2 \\ 3 \end{bmatrix}, \begin{bmatrix} 2 \\ 4 \end{bmatrix}, \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \begin{bmatrix} -1 \\ -1 \end{bmatrix}, \begin{bmatrix} 1 \\ 3 \end{bmatrix}, \\\text{Treatment-3} \quad \begin{bmatrix} 2 \\ -1 \end{bmatrix}, \begin{bmatrix} 3 \\ 2 \end{bmatrix}, \begin{bmatrix} 3 \\ 2 \end{bmatrix}, \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \begin{bmatrix} 2 \\ 1 \end{bmatrix}, \begin{bmatrix} -1 \\ -1 \end{bmatrix}, \begin{bmatrix} 1 \\ 3 \end{bmatrix}, \begin{bmatrix} -1 \\ -1 \end{bmatrix}, \begin{bmatrix} 1 \\ 3 \end{bmatrix}, \begin{bmatrix} -1 \\ -1 \end{bmatrix}, \begin{bmatrix} -1 \\ 1 \end{bmatrix}, \begin{bmatrix} -3 \\ 3 \end{bmatrix}.
\end{aligned}
$$

**13.4.** Carry out a full one-way MANOVA on the following data:

$$\begin{aligned} \text{Treatment-1} \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \begin{bmatrix} 2 \\ 1 \\ -1 \end{bmatrix}, \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix}, \begin{bmatrix} 0 \\ 2 \\ 1 \end{bmatrix}, \begin{bmatrix} 0 \\ 2 \\ 1 \end{bmatrix}, \text{Treatment-2} \begin{bmatrix} 3 \\ 2 \\ 3 \end{bmatrix}, \begin{bmatrix} 4 \\ 2 \\ 5 \end{bmatrix}, \begin{bmatrix} 6 \\ 5 \\ 4 \end{bmatrix}, \begin{bmatrix} 5 \\ 6 \\ 5 \end{bmatrix}, \\ \text{Treatment-3} \begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \begin{bmatrix} -1 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}, \begin{bmatrix} 5 \\ 6 \\ 1 \end{bmatrix}, \begin{bmatrix} 5 \\ 6 \\ 2 \end{bmatrix} \end{aligned}$$

**13.5.** The following are the data on a two-way layout where *Aij* denotes the data on the *i*-th row and *j* -th column cell. (1) Perform a complete ANOVA on the first component. (2) Carry out a full MANOVA on the full data. (3) Verify that *Srow* + *Scol* + *Sint* = *Ssub* and *Ssub* + *Sres* = *Stot* , (4) Evaluate the exact density of *w*.

$$\begin{array}{ccccccccc} & 2 & 1 & 2 & -1 & & 1 & 3 & 1 & 1 & & 1 & -1 & 1 & -1\\ A\_{11} = & 1 & 2 & 1 & 0 & , \ A\_{12} = & 4 & 4 & 1 & 1 & , \ A\_{13} = & -2 & 1 & 2 & 1 & , \\ & -1 & 3 & 1 & 4 & & 6 & 3 & -1 & 0 & & 1 & -2 & 1 & -2\\ & 3 & 4 & 2 & 3 & & 1 & -1 & 1 & -1 & & 2 & 3 & 3 & 0\\ A\_{21} = & 2 & -1 & 2 & 3 & , \ A\_{22} = & 0 & 1 & -1 & 1 & , \ A\_{23} = & 3 & 2 & -1 & 2 & .\\ & 3 & 5 & 2 & 2 & & 1 & 1 & 1 & 1 & & 3 & -3 & 2 & 4 & . \end{array}$$

**13.6.** Carry out a complete MANOVA on the following data where *Aij* indicates the data in the *i*-th row and *j* -th column cell.

$$\begin{aligned} A\_{11} &= \begin{array}{ccccccccc} 1 & -1 & 1 & 1 & 1 \\ 3 & 2 & 1 & 4 & 2 \end{array}, \ A\_{12} = \begin{array}{ccccc} 4 & 3 & 4 & 2 & 2 \\ 5 & 6 & 7 & 2 & 1 \end{array}, \ A\_{13} = \begin{array}{ccccc} 5 & 3 & 5 & 4 & 5 \\ 5 & 4 & 2 & 4 & 5 \end{array}, \\\ A\_{21} &= \begin{array}{ccccc} 0 & 1 & 1 & 0 & -2 \\ 2 & 2 & 0 & -1 & -1 \end{array}, \ A\_{22} = \begin{array}{ccccc} 6 & 7 & 6 & 4 & 5 \\ 4 & 5 & 2 & 3 & 4 \end{array}, \ A\_{23} = \begin{array}{ccccc} 1 & 0 & 1 & -1 & -1 \\ 1 & -1 & 2 & 1 & 2 \end{array}. \end{aligned}$$

**13.7.** Under the hypothesis *A*<sup>1</sup> = ··· = *Ar* = *O*, prove that *U*<sup>1</sup> = *(Sres* + *Srow)* −1 <sup>2</sup> *Sres (Sres* + *Srow)* −1 <sup>2</sup> , is a real matrix-variate type-1 beta with the parameters *(rs(t*−1*)* <sup>2</sup> *, <sup>r</sup>*−<sup>1</sup> <sup>2</sup> *)* for *r* ≥ *p, rs(t* − 1*)* ≥ *p,* when the hypothesis *Γij* = *O* for all *i* and *j* is not rejected or assuming that *Γij* = *O*. The determinant of *U*<sup>1</sup> appears in the likelihood ratio criterion in this case.

**13.8.** Under the hypothesis *B*<sup>1</sup> = ··· = *Bs* = *O* when the hypothesis *Γij* = *O* is not rejected, or assuming that *Γij* = *O*, prove that *U*<sup>2</sup> = *(Sres* + *Scol)* −1 <sup>2</sup> *Sres(Sres* + *Scol)* −1 2 is a real matrix-variate type-1 beta random variable with the parameters *(rs(t*−1*)* <sup>2</sup> *, <sup>s</sup>*−<sup>1</sup> <sup>2</sup> *)* for *s* ≥ *p, rs(t* − 1*)* ≥ *p*. The determinant of *U*<sup>2</sup> appears in the likelihood ratio criterion for testing the main effect *Bj* = *O, j* = 1*,...,s*.

**13.9.** Show that when *rst* → ∞*,* <sup>−</sup>2 ln *<sup>λ</sup>*<sup>1</sup> <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *p(r*−1*)* , that is, −2 ln *λ*<sup>1</sup> asymptotically tends to a real scalar chisquare having *p(r* − 1*)* degrees of freedom, where *λ*<sup>1</sup> = |*U*1| and *U*<sup>1</sup> is as defined in Exercise 13.7. [Hint: Look into the general *h*-th moment of *λ*<sup>1</sup> in this case, which can be evaluated by using the density of *U*1]. Hence for large values of *rst*, one can use this approximate chisquare distribution for testing the hypothesis *A*<sup>1</sup> =···= *Ar* = *O.*

**13.10.** Show that when *rst* → ∞*,* <sup>−</sup>2 ln *<sup>λ</sup>*<sup>2</sup> <sup>→</sup> *<sup>χ</sup>*<sup>2</sup> *p(s*−1*) ,* that is, −2 ln *λ*<sup>2</sup> asymptotically converges to a real scalar chisquare having *p(s* − 1*)* degrees of freedom, where *λ*<sup>2</sup> = |*U*2| with *U*<sup>2</sup> as defined in Exercise 13.8. [Hint: Look at the *h*-th moment of *λ*2]. For large values of *rst*, one can utilize this approximate chisquare distribution for testing the hypothesis *B*<sup>1</sup> =···= *Bs* = *O*.

#### **References**

A.M. Mathai (1993): *A Handbook of Generalized Special Functions for Statistical and Physical Sciences*, Oxford University Press, Oxford.

A.M. Mathai and H.J. Haubold (2017): *Probability and Statistics: A Course for Physicists and Engineers*, De Gruyter, Germany.

A.M. Mathai, R.K. Saxena and H.J. Haubold (2010): *The H-function: Theory and Applications*, Springer, New York.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 14 Profile Analysis and Growth Curves**

### **14.1. Introduction**

We will utilize the same notations as in the previous chapters. Lower-case letters *x, y,...* will denote real scalar variables, whether mathematical or random. Capital letters *X, Y, . . .* will be used to denote real matrix-variate mathematical or random variables, whether square or rectangular matrices are involved. A tilde will be placed on top of letters such as *x,*˜ *y,*˜ *X,* ˜ *Y*˜ to denote variables in the complex domain. Constant matrices will for instance be denoted by *A, B, C*. A tilde will not be used on constant matrices unless the point is to be stressed that the matrix is in the complex domain. The determinant of a square matrix *A* will be denoted by |*A*| or det*(A)* and, in the complex case, the absolute value or modulus of the determinant of *A* will be denoted as |det*(A)*|. When matrices are square, their order will be taken as *p* × *p*, unless specified otherwise. When *A* is a full rank matrix in the complex domain, then *AA*∗ is Hermitian positive definite where an asterisk designates the complex conjugate transpose of a matrix. Additionally, d*X* will indicate the wedge product of all the distinct differentials of the elements of the matrix *X*. Thus, letting the *p* × *q* matrix *X* = *(xij )* where the *xij* 's are distinct real scalar variables, <sup>d</sup>*<sup>X</sup>* = ∧*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> <sup>∧</sup>*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>d</sup>*xij* . For the complex matrix *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup> *iX*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, where *<sup>X</sup>*<sup>1</sup> and *X*<sup>2</sup> are real, d*X*˜ = d*X*<sup>1</sup> ∧ d*X*2.

#### **14.1.1. Profiles**

The two topics that are treated in this chapter are profile analysis and growth curves. We first consider profile analysis. Only three types of profiles will be considered. As an example, take the amount of money spent at a specific grocery store by six families residing in a given neighborhood. Suppose that they all do their grocery shopping once a week every Saturday. Let us consider the sum spent on the average in the long run on four types of items: item (1): vegetables; item (2): baked goods; item (3): meat products; item (4): cleaning supplies. Let *μij* denote the expected amount or the average amount spent in the long run by the *j* -th family on the *i*-th item, *j* = 1*,* 2*,* 3*,* 4*,* 5*,* 6*,* and *i* = 1*,* 2*,* 3*,* 4.

<sup>©</sup> The Author(s) 2022

A. M. Mathai et al., *Multivariate Statistical Analysis in the Real and Complex Domains*, https://doi.org/10.1007/978-3-030-95864-0 14

The monetary values of these purchases are plotted as points in Fig. 14.1.1. In order to see the pattern, these points are joined by straight lines, which are not actually part of the graph. From the pattern or profile, we may note that families *F*1*, F*2*, F*3*, F*<sup>4</sup> have parallel profiles, that the profiles of *F*<sup>1</sup> and *F*<sup>4</sup> coincide, and that *F*<sup>5</sup> has a constant profile or is horizontal with respect to the base line profile. The profile of *F*<sup>6</sup> does not fall into any category relative to the other profiles.

**Figure 14.1.1** Profile plot for 6 families: amounts spent weekly on 4 items

In this instance, we refer to the the profiles of *F*1*, F*2*, F*3*, F*<sup>4</sup> as "parallel profiles" where the profiles of *F*<sup>1</sup> and *F*<sup>4</sup> are "coincident profiles". *F*<sup>5</sup> is said to have a constant profile or "level profile". We now study these patterns in more details starting with parallel profiles.

#### **14.2. Parallel Profiles**

Let *μij* be the expected dollar amount of the purchase of the *j* -th family on the *i*-th item. In general, if the items are denoted by the real scalar variables *x*1*<sup>j</sup> ,...,xpj* , then *μij* = *E*[*xij* ], *i* = 1*,...,p* and *j* = 1*,...,k*. Thus, we have *k*, *p*-variate populations. The items or variables may be quantitative or qualitative. The *p* items may be *p* different stimulants and the response may be the expected enhancement in the performance of a sportsman; the *p* items could also be *p* different animal feeds and the response could then be the expected weight increase, and so on. The *μij* 's, that is, the expected responses, may be quantitative, or qualitative but convertible to quantitative readings. If the items are various color combinations in a lady's dress, the response of the first observer *F*<sup>1</sup> with respect to a certain color combination may be "very pleasing", the response of the second observer *F*<sup>2</sup> may be "indifferent", etc. These assessments of color combinations can, for example, be quantified as readings on a 0 to 10 scale.

Let us start with *k* = 2, that is, two *p*-variate populations, and let the responses be

$$M\_1 = \begin{bmatrix} \mu\_{11} \\ \mu\_{21} \\ \vdots \\ \mu\_{p1} \end{bmatrix} \text{ and } M\_2 = \begin{bmatrix} \mu\_{12} \\ \mu\_{22} \\ \vdots \\ \mu\_{p2} \end{bmatrix}.$$

Then, what is the meaning of parallel profiles? We can impose the conditions *μ*<sup>12</sup> −*μ*<sup>11</sup> = *μ*<sup>22</sup> − *μ*21*, μ*<sup>22</sup> − *μ*<sup>21</sup> = *μ*<sup>32</sup> − *μ*31, etc., which is tantamount to saying that

$$
\mu\_{j-1/2} - \mu\_{j-1/1} = \mu\_{j2} - \mu\_{j1} \\
\Rightarrow \mu\_{j1} - \mu\_{j-11} = \mu\_{j2} - \mu\_{j-12}, \text{ for } j = 2, \ldots, p. \tag{14.2.1}
$$

The conditions specified in (14.2.1) can be expressed in matrix notation. Letting *C* be the following *(p* − 1*)* × *p* matrix:

$$C = \begin{bmatrix} -1 & 1 & 0 & \dots & 0 & 0\\ 0 & -1 & 1 & \dots & 0 & 0\\ & \vdots & \ddots & \ddots & \ddots & \vdots & \vdots\\ & 0 & \dots & 0 & -1 & 1 & 0\\ 0 & 0 & \dots & 0 & -1 & 1 \end{bmatrix} \Rightarrow CM\_1 = \begin{bmatrix} -\mu\_{11} + \mu\_{21} \\ -\mu\_{21} + \mu\_{31} \\ \vdots \\ -\mu\_{p-11} + \mu\_{p1} \end{bmatrix}, \\ CM\_2 = \begin{bmatrix} -\mu\_{12} + \mu\_{22} \\ -\mu\_{22} + \mu\_{32} \\ \vdots \\ -\mu\_{p-12} + \mu\_{p2} \end{bmatrix} \tag{14.2.2}$$

Then, *CM*<sup>1</sup> = *CM*2. Therefore, "parallel profiles" means *CM*<sup>1</sup> = *CM*2, and we can test the hypothesis *CM*<sup>1</sup> = *CM*<sup>2</sup> against *CM*<sup>1</sup> = *CM*<sup>2</sup> for determining whether the profiles are parallel or not. In order to avoid any possible confusion, let us denote the two vectors as *X* and *Y* , and let *M*<sup>1</sup> = *E*[*X*] and *M*<sup>2</sup> = *E*[*Y* ], that is,

$$X = \begin{bmatrix} x\_1 \\ x\_2 \\ \vdots \\ x\_p \end{bmatrix}, \ Y = \begin{bmatrix} y\_1 \\ y\_2 \\ \vdots \\ y\_p \end{bmatrix}, \ E[X] = M\_1 = \begin{bmatrix} \mu\_{11} \\ \mu\_{21} \\ \vdots \\ \mu\_{p1} \end{bmatrix}, \ E[Y] = M\_2 = \begin{bmatrix} \mu\_{12} \\ \mu\_{22} \\ \vdots \\ \mu\_{p2} \end{bmatrix}. \tag{14.2.3}$$

For testing hypotheses, we need to assume some distributions for *X* and *Y* . Let *X* and *Y* be independently distributed real Gaussian *p*-variate variables where *X* ∼ *Np(M*1*,Σ)* and *Y* ∼ *Np(M*2*, Σ), Σ > O*. Observe that the *(p* − 1*)* × *p* matrix *C* is of full rank *p* − 1. Then, on applying a result previously obtained in Chap. 3,

$$CX \sim N\_{p-1}(CM\_1, C\Sigma C') \text{ and } \\ CY \sim N\_{p-1}(CM\_2, C\Sigma C'). \tag{14.2.4}$$

Assume that we have a simple random sample of size *n*<sup>1</sup> from *X* and a simple random sample of size *n*<sup>2</sup> from *Y* . Let *X*1*,...,Xn*<sup>1</sup> be the sample values from *X* and let *Y*1*,...,Yn*<sup>2</sup> be the sample values from *Y* . Let the sample averages be *X*¯ and *Y*¯, the sample matrices be the boldfaced **X** and **Y**, the matrices of sample means be **X**¯ and **Y**¯ , the deviation matrices be **X***<sup>d</sup>* and **Y***<sup>d</sup>* and the sample sum of products matrices be *S*<sup>1</sup> = **X***d***X**- *<sup>d</sup>* and *S*<sup>2</sup> = **Y***d***Y**- *d* . Then, these are the following:

$$\begin{aligned} \mathbf{X} &= [X\_1, \dots, X\_{n\_1}] = \begin{bmatrix} \boldsymbol{x}\_{11} & \boldsymbol{x}\_{12} & \dots & \boldsymbol{x}\_{1n\_1} \\ \boldsymbol{x}\_{21} & \boldsymbol{x}\_{22} & \dots & \boldsymbol{x}\_{2n\_1} \\ \vdots & \vdots & \ddots & \vdots \\ \boldsymbol{x}\_{p1} & \boldsymbol{x}\_{p2} & \dots & \boldsymbol{x}\_{pn\_1} \end{bmatrix}, \\\bar{\boldsymbol{X}} &= \begin{bmatrix} \sum\_{j=1}^{n\_1} \boldsymbol{x}\_{1j}/n\_1 \\ \sum\_{j=1}^{n\_1} \boldsymbol{x}\_{2j}/n\_1 \\ \vdots \\ \sum\_{j=1}^{n\_1} \boldsymbol{x}\_{pj}/n\_1 \end{bmatrix} = \begin{bmatrix} \bar{\boldsymbol{x}}\_1 \\ \bar{\boldsymbol{x}}\_2 \\ \vdots \\ \bar{\boldsymbol{x}}\_p \end{bmatrix}, \\\bar{\mathbf{X}} &= [\bar{\boldsymbol{X}}, \bar{\boldsymbol{X}}, \dots, \bar{\boldsymbol{X}}], \ \mathbf{X}\_d = \mathbf{X} - \bar{\mathbf{X}}, \ \mathbf{S}\_1 = \mathbf{X}\_d \mathbf{X}\_d, \end{aligned}$$

with similar expressions in the case of *Y* for which the sum of squares and cross products matrix is denoted as *S*<sup>2</sup> = **Y***d***Y**- *<sup>d</sup>* . Note that *<sup>X</sup>*¯ <sup>∼</sup> *Np(M*1*,* <sup>1</sup> *n*1 *Σ), Σ > O* ⇒ *CX*¯ ∼ *Np*−1*(CM*1*,* <sup>1</sup> *n*1 *CΣC*- *)*. As well, *CY*¯ <sup>∼</sup> *Np*−1*(CM*2*,* <sup>1</sup> *n*2 *CΣC*- *)*. The sum of squares and cross product matrices of *CX* and *CY* are *CS*1*C* and *CS*2*C*- , respectively. Since *X* and *Y* are independently distributed, we have

$$\begin{aligned} \bar{X} - \bar{Y} &\sim N\_p(M\_1 - M\_2, (\frac{1}{n\_1} + \frac{1}{n\_2})\Sigma) \Rightarrow \\ C\bar{X} - C\bar{Y} &\sim N\_{p-1}(C(M\_1 - M\_2), (\frac{1}{n\_1} + \frac{1}{n\_2})C\Sigma C'). \end{aligned} \tag{14.2.5}$$

An unbiased estimator for *<sup>Σ</sup>* is *<sup>Σ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *<sup>n</sup>*1+*n*2−<sup>2</sup> *(S*<sup>1</sup> <sup>+</sup> *<sup>S</sup>*2*)* and then, Hotelling's *<sup>T</sup>* <sup>2</sup> statistic is

$$T^2 = (\frac{1}{n\_1} + \frac{1}{n\_2})^{-1} (\bar{X} - \bar{Y})' C' [(C\hat{\Sigma}C')^{-1}] C (\bar{X} - \bar{Y}).\tag{14.2.6}$$

The usual test is based on Hotelling's *T* <sup>2</sup> statistic. However, referring to Sects. 6.3.1 and 6.3.3, we are making use of a statistic based on a real type-2 beta random variable. Thus, for *n*<sup>1</sup> + *n*<sup>2</sup> − *p* = *(n*<sup>1</sup> + *n*<sup>2</sup> − 2*)* + 1 − *(p* − 1*)*, and under the hypothesis *Ho*<sup>1</sup> : *CM*<sup>1</sup> = *CM*2,

$$w = \left[\frac{n\_1 + n\_2 - p}{p - 1}\right] (\frac{1}{n\_1} + \frac{1}{n\_2})^{-1} (\bar{X} - \bar{Y})' C' (C(S\_1 + S\_2)C')^{-1} C(\bar{X} - \bar{Y}) \sim F\_{p - 1, n\_1 + n\_2 - p}. \tag{14.2.7}$$

Accordingly, for testing the hypothesis *Ho*<sup>1</sup> : *CM*<sup>1</sup> = *CM*<sup>2</sup> versus *CM*<sup>1</sup> = *CM*2, we reject *Ho*<sup>1</sup> if the observed value of *w* of (14.2.7) is greater than or equal to *Fp*−1*,n*1+*n*2−*p, α*, the upper 100 *α*% percentage point from an *Fp*−1*,n*1+*n*2−*<sup>p</sup>* distribution.

**Example 14.2.1.** The following are independent observed samples of sizes *n*<sup>1</sup> = 4 and *n*<sup>2</sup> = 5 from trivariate Gaussian populations sharing the same covariance matrix *Σ>O*. Test the hypothesis *Ho* : *CM*<sup>1</sup> = *CM*<sup>2</sup> at the 5% significance level:

$$\begin{aligned} \text{Observed sample-1: } X\_1 &= \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \ X\_2 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \ X\_3 = \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}, \ X\_4 = \begin{bmatrix} 1 \\ 0 \\ 2 \end{bmatrix}; \\ \text{Observed sample-2: } Y\_1 &= \begin{bmatrix} 0 \\ -1 \\ -1 \end{bmatrix}, \ Y\_2 = \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}, \ Y\_3 = \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix}, \ Y\_4 = \begin{bmatrix} -1 \\ 2 \\ -2 \end{bmatrix}, \ Y\_5 = \begin{bmatrix} -1 \\ 1 \\ 0 \end{bmatrix}. \end{aligned}$$

**Solution 14.2.1.** Let us first evaluate the 2×3 matric *C* as well as *X,* ¯ *Y ,*¯ **X***,* **X**¯ *,* **Y**¯ *,* **X***<sup>d</sup> ,* **Y***<sup>d</sup> , S*<sup>1</sup> and *S*2:

$$\begin{aligned} C &= \begin{bmatrix} -1 & 1 & 0 \\ 0 & -1 & 1 \end{bmatrix}, \bar{X} = \frac{1}{4} \sum\_{j=1}^{4} X\_j = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \\ \bar{Y} &= \frac{1}{5} \sum\_{j=1}^{5} Y\_j = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}, \bar{\mathbf{X}} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & -1 & 0 \\ -1 & 1 & 2 & 2 \end{bmatrix}, \bar{\mathbf{X}} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \end{bmatrix}, \\ \mathbf{X}\_d &= \mathbf{X} - \bar{\mathbf{X}} = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & -1 & 0 \\ -2 & 0 & 1 & 1 \end{bmatrix}, \ S\_1 = \mathbf{X}\_d \mathbf{X}\_d' = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 2 & -1 \\ 0 & -1 & 6 \end{bmatrix}; \end{aligned}$$

$$\begin{aligned} \mathbf{Y} &= \begin{bmatrix} 0 & 1 & -1 & 1 & -1 \\ -1 & 1 & 2 & 2 & 1 \\ -1 & 2 & 1 & -2 & 0 \end{bmatrix}, \mathbf{Y}\_d = \begin{bmatrix} 0 & 1 & -1 & 1 & -1 \\ -2 & 0 & 1 & 1 & 0 \\ -1 & 2 & 1 & -2 & 0 \end{bmatrix}, \\\ \mathbf{S}\_2 &= \mathbf{Y}\_d \mathbf{Y}\_d' = \begin{bmatrix} 4 & 0 & -1 \\ 0 & 6 & 1 \\ -1 & 1 & 10 \end{bmatrix}, \ S\_1 + \mathbf{S}\_2 = \begin{bmatrix} 4 & 0 & -1 \\ 0 & 8 & 0 \\ -1 & 0 & 16 \end{bmatrix}, \ C(\mathbf{S}\_1 + \mathbf{S}\_2)C' = \begin{bmatrix} 12 & -7 \\ -7 & 24 \end{bmatrix}, \end{aligned}$$

$$[C(S\_1 + S\_2)C']^{-1} = \frac{1}{239} \left[\begin{array}{cc} 24 & 7\\ 7 & 12 \end{array}\right], \ C' [C(S\_1 + S\_2)C']^{-1} C = \frac{1}{239} \left[\begin{array}{cc} 24 & -17 & -7\\ -17 & 22 & -5\\ -7 & -5 & 12 \end{array}\right],$$

$$\bar{X} - \bar{Y} = \begin{bmatrix} 1\\ -1\\ 1 \end{bmatrix}, \ \left(\frac{1}{n\_1} + \frac{1}{n\_2}\right)^{-1} = \frac{20}{9}.$$

Therefore,

$$\begin{split} w &= \frac{(n\_1 + n\_2 - p)}{p - 1} \left( \frac{1}{n\_1} + \frac{1}{n\_2} \right)^{-1} (\bar{X} - \bar{Y})^\prime C^\prime [C(S\_1 + S\_2)C^\prime]^{-1} C(\bar{X} - \bar{Y}) \\ &= \frac{6}{2} \left( \frac{20}{(9)(239)} \right) [1, -1, 1] \begin{bmatrix} 24 & -17 & -7 \\ -17 & 22 & -5 \\ -7 & -5 & 12 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix} \\ &= 3 \binom{20}{(9)(239)} 88 = 2.45. \end{split}$$

From the tabulated values for *α* = 0*.*05*,*

$$F\_{p-1,n\_1+n\_2-p,\alpha} = F\_{2,6,0.05} = \mathfrak{S}.14 > \mathfrak{Z}.4\mathfrak{S}.$$

Hence, the hypothesis *Ho*<sup>1</sup> : *CM*<sup>1</sup> = *CM*<sup>2</sup> is not rejected at the 5% significance level.

#### **14.3. Coincident Profiles**

With the notations employed in Sect. 14.2, two profiles are coincident if *μi*<sup>1</sup> = *μi*2*, i* = 1*,...,p*, which amounts to assuming *Ho*<sup>2</sup> : *M*<sup>1</sup> = *M*2. If the *p* × 1 real vectors *X* and *Y* are *X* ∼ *Np(M*1*,Σ)* and *Y* ∼ *Np(M*2*, Σ), Σ > O,* and the populations are independently distributed, this is the test for equality of mean value vectors in two independent Gaussian populations with common unknown covariance matrix, which was discussed in Chap. 6 where a Hotelling's *T* <sup>2</sup> is usually utilized for testing this hypothesis. We will use a statistic based on a real scalar type-2 beta variable, which has already been considered in Sects. 6.3.1 and 6.3.3. Given samples of sizes *n*<sup>1</sup> and *n*<sup>2</sup> from these normal populations, and letting *X*¯ and *Y*¯ be the sample means and *S*<sup>1</sup> and *S*<sup>2</sup> be the sample sum of squares and cross products matrices, the test statistic under the hypothesis *Ho*2, denoted by *w*1, is the following:

$$w\_1 = \left[\frac{n\_1 + n\_2 - 1 - p}{p}\right] \left(\frac{1}{n\_1} + \frac{1}{n\_2}\right)^{-1} (\bar{X} - \bar{Y})' (S\_1 + S\_2)^{-1} (\bar{X} - \bar{Y}) \sim F\_{p, n\_1 + n\_2 - 1 - p} \tag{14.3.1}$$

Thus, we reject the hypothesis of equality of population mean values or the hypothesis *Ho*<sup>2</sup> : *M*<sup>1</sup> = *M*2*,* whenever the observed value of *w*<sup>1</sup> is greater than or equal to *Fp, n*1+*n*2−1−*p, α,* the upper 100 *α*% percentage point of an *F*-distribution having *p* and *n*<sup>1</sup> + *n*<sup>2</sup> − 1 − *p* degrees of freedom.

**Example 14.3.1.** With the data provided in Example 14.2.1 and under the same assumptions, test the hypothesis that the two profiles are coincident, that is, test *Ho*<sup>2</sup> : *M*<sup>1</sup> = *M*<sup>2</sup> .

**Solution 14.3.1.** From Solution 14.2.1, we have: *n*<sup>1</sup> = 4*, n*<sup>2</sup> = 5*,*

$$
\bar{X} = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \ \bar{Y} = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}, \ S\_1 = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 2 & -1 \\ 0 & -1 & 6 \end{bmatrix}, \ S\_2 = \begin{bmatrix} 4 & 0 & -1 \\ 0 & 6 & 1 \\ -1 & 1 & 10 \end{bmatrix},
$$

so that

$$\mathbf{S}\_1 + \mathbf{S}\_2 = \begin{bmatrix} 4 & 0 & -1 \\ 0 & 8 & 0 \\ -1 & 0 & 16 \end{bmatrix}, \ (\mathbf{S}\_1 + \mathbf{S}\_2)^{-1} = \frac{1}{504} \begin{bmatrix} 128 & 0 & 8 \\ 0 & 63 & 0 \\ 8 & 0 & 32 \end{bmatrix}, \ p = 3, \ n\_1 + n\_2 - 1 - p = 5.1$$

Then, an observed value of the test statistic, again denoted by *w*1, is the following:

$$\begin{split} w\_{1} &= \frac{(n\_{1} + n\_{2} - 1 - p)}{p} \left(\frac{1}{n\_{1}} + \frac{1}{n\_{2}}\right)^{-1} (\bar{X} - \bar{Y})' (S\_{1} + S\_{2})^{-1} (\bar{X} - \bar{Y}) \\ &= \left(\frac{5}{3}\right) \left(\frac{20}{9}\right) \frac{1}{504} [1, -1, 1] \begin{bmatrix} 128 & 0 & 8\\ 0 & 63 & 0\\ 8 & 0 & 32 \end{bmatrix} \begin{bmatrix} 1\\ -1\\ 1 \end{bmatrix} \\ &= \left(\frac{5}{3}\right) \left(\frac{20}{9}\right) \left(\frac{239}{504}\right) = \frac{23900}{13608} = 1.76. \end{split}$$

The tabulated value of *Fp,n*1+*n*2−1−*p, α* = *F*3*,*5*,*0*.*<sup>05</sup> being 5*.*41 *>* 1*.*76, the hypothesis *M*<sup>1</sup> = *M*<sup>2</sup> against the natural alternative *M*<sup>1</sup> = *M*<sup>2</sup> is not rejected at the 5% level of significance.

#### **14.3.1. Conditional hypothesis for coincident profiles**

If we already know that the two profiles are parallel, then the second profile is either above the first one or below the first one, and these two profiles are equal only if the sum of the elements in *M*<sup>1</sup> is equal to the sum of the elements in *M*2. There should be caution in making use of this argument. We must know that the two profiles are assuredly parallel. If the hypothesis that two profiles are parallel is not rejected, this does not necessarily imply

that the two profiles are parallel. It only entails that the departure from parallelism is not significant. Accordingly, a test for coincidence that is based on the sum of the elements is not justifiable, and the test being herein discussed is not a test for coincidence but only a test of the hypothesis that the sum of the elements in *M*<sup>1</sup> is equal to the sum of the elements in *M*2. Observe that this equality can hold whether the profiles are coincident or not. The sum of the elements in *M*<sup>1</sup> is given by *J* - *M*<sup>1</sup> and the sum of the elements in *M*<sup>2</sup> is given by *J* - *M*<sup>2</sup> where *J* - = *(*1*,* 1*,...,* 1*).* Under independence and normality with *X* ∼ *Np(M*1*,Σ)* and *Y* ∼ *Np(M*2*, Σ), Σ > O,* we have *J* - *X*¯ ∼ *N*1*(J* - *M*1*,* <sup>1</sup> *n*1 *J* - *ΣJ)* and *J* - *Y*¯ ∼ *N*1*(J* - *M*2*,* <sup>1</sup> *n*2 *J* - *ΣJ)*, and hence,

$$\begin{split} t^2 &= (n\_1 + n\_2 - 2) \binom{n\_1 n\_2}{n\_1 + n\_2} [J'(\bar{X} - \bar{Y}) - J'(M\_1 - M\_2)] \\ &\times \left[ J'(S\_1 + S\_2)J \right]^{-1} [J'(\bar{X} - \bar{Y}) - J'(M\_1 - M\_2)] \end{split} \tag{14.3.2}$$

where, under the hypothesis *Ho*<sup>3</sup> : *J* - *M*<sup>1</sup> − *J* - *M*<sup>2</sup> = 0, *t* is a Student-*t* having *n*<sup>1</sup> + *n*<sup>2</sup> − 2 degrees of freedom and *<sup>t</sup>*<sup>2</sup> <sup>∼</sup> *<sup>F</sup>*1*,n*1+*n*2<sup>−</sup>2. Thus, we may utilize a Student-*<sup>t</sup>* statistic with *n*<sup>1</sup> +*n*<sup>2</sup> −2 degrees of freedom or an *F*-statistic with 1 and *n*<sup>1</sup> +*n*<sup>2</sup> −2 degrees of freedom in order to reach a decision with respect to the hypothesis of the equality of the sum of the elements in *M*<sup>1</sup> and *M*2.

**Example 14.3.2.** With the data provided in Example 14.2.1, test the hypothesis *J* - *M*<sup>1</sup> = *J* - *M*<sup>2</sup> or the equality of the sum of elements in *M*<sup>1</sup> and *M*2.

**Solution 14.3.2.** In this case, *<sup>n</sup>*<sup>1</sup> <sup>+</sup> *<sup>n</sup>*<sup>2</sup> <sup>−</sup> <sup>2</sup> <sup>=</sup> <sup>7</sup>*, <sup>n</sup>*1*n*<sup>2</sup> *<sup>n</sup>*1+*n*<sup>2</sup> <sup>=</sup> <sup>9</sup> 20 ,

$$J'(\bar{X} - \bar{Y}) = [1, 1, 1] \begin{bmatrix} 1 \\ -1 \\ 1 \end{bmatrix} = 1, \ J'(S\_1 + S\_2)J = [1, 1, 1] \begin{bmatrix} 4 & 0 & -1 \\ 0 & 8 & 0 \\ -1 & 0 & 16 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} = 26.1$$

Hence, the observed value of *t*<sup>2</sup> in (14.3.2) is

$$t^2 = 7 \left(\frac{9}{20}\right) \left(\frac{1}{26}\right) = 0.121 \text{ while } F\_{1,7,0.05} = \textbf{5.596.61} > 0.121.$$

Thus, the hypothesis of equality of the sum of the elements in *M*<sup>1</sup> and *M*<sup>2</sup> is not rejected at the 5% significance level.

#### **14.4. Level Profiles**

A level profile occurs when the plot is horizontal with respect to the base line or when all the elements in *M*<sup>1</sup> are equal. Thus, this hypothesis involves only one population. If there are two populations, we can consider the hypothesis of the mean value vectors *M*<sup>1</sup> and *M*<sup>2</sup> being equal or testing whether the sum of the elements in each of *M*<sup>1</sup> and *M*<sup>2</sup> are equal. When the null hypothesis is not rejected, the sum in *M*<sup>1</sup> may be different from the sum in *M*2. However, if we assume the profiles to be coincident, these two sums will be equal. We may then create a conditional hypothesis to the effect that the profiles in the two populations are level, given that the profiles are coincident. This is how one can connect the two populations and test whether the profiles are level. Let the *p* × 1 vector *Xj* ∼ *Np(M, Σ), Σ > O*, and let *X*1*,...,Xn* be iid as *Xj* . In this instance, we have one real *p*-variate Gaussian population and a simple random sample of size *n* with *E*[*Xj* ] = *M* and Cov*(Xj )* = *Σ, j* = 1*, ...,n*. Let

$$M = \begin{bmatrix} \mu\_1 \\ \mu\_2 \\ \vdots \\ \mu\_p \end{bmatrix}, \ X\_j = \begin{bmatrix} x\_{1j} \\ x\_{2j} \\ \vdots \\ x\_{pj} \end{bmatrix}, \ \bar{X} = \begin{bmatrix} \bar{x}\_1 \\ \bar{x}\_2 \\ \vdots \\ \bar{x}\_p \end{bmatrix} = \begin{bmatrix} \sum\_{j=1}^n x\_{1j}/n \\ \sum\_{j=1}^n x\_{2j}/n \\ \vdots \\ \sum\_{j=1}^n x\_{pj}/n \end{bmatrix}.$$

The hypothesis of "level profile" means *Ho*<sup>3</sup> : *μ*<sup>1</sup> = *μ*<sup>2</sup> =···= *μp* = *μ* for some *μ*. The common value *<sup>μ</sup>* is estimated by *<sup>μ</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *np <sup>p</sup> i*=1 *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *xij* , which is the maximum likelihood estimator/estimate. Then, under the hypothesis, *M*ˆ is such that *M*ˆ - = [ˆ*μ,μ, . . . ,* ˆ *μ*ˆ]. Testing such a hypothesis has already been considered in Chap. 6, the test being based on the statistic

$$
\mu = n(\bar{X} - \hat{M})' S^{-1} (\bar{X} - \hat{M}) \sim \text{ type-2 beta} \tag{14.4.1}
$$

with the parameters *(p*−<sup>1</sup> <sup>2</sup> *, <sup>n</sup>*−*p*+<sup>1</sup> <sup>2</sup> *)*, under the hypothesis *Ho*3, referring more specifically to the derivations related to Eq. (6.3.14). Thus under *Ho*3*,*

$$\frac{(n-p+1)}{p-1}u \sim F\_{p-1,n-p+1} \tag{14.4.2}$$

where *Fp*−1*,n*−*p*+<sup>1</sup> is a real scalar *F*-random variable with *p* − 1 and *n* − *p* + 1 degrees of freedoms. We reject *Ho*<sup>3</sup> if the observed value of the *Fp*−1*,n*−*p*+<sup>1</sup> as specified in (14.4.2) is greater than or equal to *Fp*−1*,n*−*p*+<sup>1</sup>*,α,* the upper 100 *α*% percentage point of an *Fp*−1*,n*−*p*+<sup>1</sup> distribution. This test has already been illustrated in Chap. 6.

#### **14.4.1. Level profile, two independent Gaussian populations**

We can make use of (14.4.1) and test for level profile in each of the two populations separately. Let the populations and simple random samples from these populations be as follows: *Xj iid* ∼ *Np(M*1*, Σ), Σ > O, j* = 1*,...,n*1, and *Yj iid* ∼ *Np(M*2*, Σ), Σ > O, j* = 1*,...,n*2*,* where *Σ* is identical for both of these independent Gaussian populations, and let

$$E[X\_j] = M\_1 = \begin{bmatrix} \mu\_{11} \\ \mu\_{21} \\ \vdots \\ \mu\_{p1} \end{bmatrix}, \ j = 1, \dots, n\_1; \ E[Y\_j] = M\_2 = \begin{bmatrix} \mu\_{12} \\ \mu\_{22} \\ \vdots \\ \mu\_{p2} \end{bmatrix}, \ j = 1, \dots, n\_2.$$

When population *<sup>X</sup>* has a level profile, *<sup>μ</sup>*<sup>11</sup> <sup>=</sup> *<sup>μ</sup>*<sup>21</sup> = ··· = *μp*<sup>1</sup> <sup>=</sup> *μ(*1*)* and when population *<sup>Y</sup>* has a level profile, *<sup>μ</sup>*<sup>12</sup> <sup>=</sup> *<sup>μ</sup>*<sup>22</sup> =···= *μp*<sup>2</sup> <sup>=</sup> *μ(*2*)* , *μ(*1*)* need not be equal to *μ(*2*)* . However, if we assume that the two populations have coincident profiles, then *<sup>μ</sup>(*1*)* <sup>=</sup> *μ(*2*)* . In this instance, the two populations are identical, so that we can pool the samples *X*1*,...,Xn*<sup>1</sup> and *Y*1*,...,Yn*<sup>2</sup> and regard the combined sample as a simple random sample of size *n*<sup>1</sup> + *n*<sup>2</sup> from the population *Np(M, Σ), Σ > O*. Thus, we can apply (14.4.1) and (14.4.2) with *n* is replaced by *n*<sup>1</sup> + *n*2*,* in order to reach a decision. Instead of letting *μ(*1*)* <sup>=</sup> *μ(*2*)* <sup>=</sup> *<sup>μ</sup>* for some *<sup>μ</sup>* and computing the estimate for *<sup>μ</sup>* or *<sup>μ</sup>*<sup>ˆ</sup> from the combined sample and utilizing *M,* ˆ *M*ˆ - =[ˆ*μ, . . . ,μ*ˆ] in (14.4.1), we can make use of the matrix *C* as defined in Sect. 14.2 and consider the transformed vector *CZj , Zj* ∼ *Np(M, Σ), Σ > O, j* <sup>=</sup> <sup>1</sup>*,...,n*<sup>1</sup> <sup>+</sup> *<sup>n</sup>*2, and use *CZ,*¯ *<sup>Z</sup>*¯ <sup>=</sup> <sup>1</sup> *n*1+*n*<sup>2</sup> [*X*<sup>1</sup> +···+ *Xn*<sup>1</sup> + *Y*<sup>1</sup> +···+ *Yn*<sup>2</sup> ] in (14.4.1). One advantage in utilizing the matrix *C* is that *CM*ˆ = *O*, and from (13.4.1), this *M*ˆ will vanish, that is, *X*¯ − *M*ˆ in (14.4.1) will become *CZ*¯. Then, the test statistic under the hypothesis *Ho*<sup>3</sup> : *C*- *M* = *O*, again denoted by *u*, becomes the following:

$$\begin{split} u &= (n\_1 + n\_2)(C\bar{Z} - O)'(CS\_{\bar{z}}C')^{-1}(C\bar{Z} - O) \\ &= (n\_1 + n\_2)\bar{Z}'C'[(CS\_{\bar{z}}C')^{-1}]C\bar{Z} \sim \text{type-2 beta} \end{split} \tag{14.4.3}$$

with the parameters *(p*−<sup>1</sup> <sup>2</sup> *, <sup>n</sup>*1+*n*2−*p*+<sup>1</sup> <sup>2</sup> *)*, under the hypothesis *Ho*3, where *Sz* is the sample sum of squares and cross products matrix from the combined sample. Then under *Ho*3*,*

$$\frac{(n\_1 + n\_2 - p + 1)}{p - 1} u \sim F\_{p - 1, n\_1 + n\_2 - p + 1}.\tag{14.4.4}$$

We reject the hypothesis that the profiles in the two populations are level, given that the two populations have coincident profiles, if the observed value of the *F* distributed statistic given in (14.4.4) is greater than or equal to *Fp*−1*,n*1+*n*2−*p*+<sup>1</sup>*,α,* the upper 100 *α*% percentage point of a real scalar *Fp*−1*,n*1+*n*2−*p*+<sup>1</sup> distribution.

**Example 14.4.1.** With the data provided in Example 14.2.1, test the hypothesis of level profile, given that the two populations have coincident profiles.

**Solution 14.4.1.** In this case, we pool the observed samples: *n*<sup>1</sup> + *n*<sup>2</sup> = 4 + 5 = 9,

$$\sum\_{j=1}^{4} X\_j + \sum\_{j=1}^{5} Y\_j = \begin{bmatrix} 4 \\ 5 \\ 4 \end{bmatrix} \Rightarrow \bar{Z} = \frac{1}{9} \begin{bmatrix} 4 \\ 5 \\ 4 \end{bmatrix} .$$

Denoting the pooled sample by a boldfaced **Z**, the matrix of pooled sample means by **Z**¯ , the deviation matrix by **Z***<sup>d</sup>* = **Z**−**Z**¯ and the sample sum of products matrix by *Sz* = **Z***d***Z**- *d* , we have

$$\begin{aligned} \mathbf{Z} &= \begin{bmatrix} 1 & 1 & 1 & 1 & 0 & 1 & -1 & 1 & -1 \\ 0 & 1 & -1 & 0 & -1 & 1 & 2 & 2 & 1 \\ -1 & 1 & 2 & 2 & -1 & 2 & 1 & -2 & 0 \end{bmatrix}, \\ \mathbf{Z}\_{d} &= \frac{1}{9} \begin{bmatrix} 5 & 5 & 5 & 5 & -4 & 5 & -13 & 5 & -13 \\ -5 & 4 & -14 & -5 & -14 & 4 & 13 & 13 & 4 \\ -13 & 5 & 14 & 14 & -13 & 14 & 5 & -22 & -4 \end{bmatrix}, \end{aligned}$$

$$\begin{aligned} \mathbf{Z}\_d \mathbf{Z}\_d' &= \frac{1}{9^2} \begin{bmatrix} 504 & -180 & 99 \\ -180 & 828 & -180 \\ 99 & -180 & 1476 \end{bmatrix} = S\_z, & C = \begin{bmatrix} -1 & 1 & 0 \\ 0 & -1 & 1 \end{bmatrix}, \\\ C S\_\varepsilon C' &= \frac{1}{9^2} \begin{bmatrix} 1692 & -1287 \\ -1287 & 2664 \end{bmatrix} = \frac{1}{9} \begin{bmatrix} 188 & -143 \\ -143 & 296 \end{bmatrix}, \end{aligned}$$

$$\begin{split} \left( \left( CS\_{\mathbb{C}} C \right)^{-1} = \frac{9}{35199} \left[ \begin{array} 296 \\ 143 \end{array} \right]\_{\mathbb{R}} \mathcal{C} \left( CS\_{\mathbb{C}} C \right)^{-1} C = \frac{9}{35199} \left[ \begin{array} 296 \\ -153 \\ -143 \end{array} \begin{array}{ll} 198 & -153 \\ -143 & 188 \end{array} \right], \\\ u = \frac{n\_1 + n\_2 - p + 1}{p - 1} (n\_1 + n\_2) \bar{Z} \, \mathcal{C} \left( CS\_{\mathbb{C}} C \right)^{-1} \mathcal{C} \bar{Z}, \\\ = \left( \frac{7}{2} \right) 9 \left( \frac{9}{35199} \right) \left( \frac{1}{9^2} \right) [4, 5, 4] \begin{bmatrix} 296 & -153 & -143 \\ -153 & 198 & -45 \\ -143 & -45 & 188 \end{bmatrix} \begin{bmatrix} 4 \\ 5 \\ 4 \end{bmatrix} = 0.0197. \end{split}$$

Since, under the null hypothesis *Ho*3, *Fp*−1*,n*1+*n*2−*p*+<sup>1</sup> = *F*2*,*<sup>7</sup> and from the tabulated values for *α* = 0*.*05, we have *F*2*,*7*,*0*.*<sup>05</sup> = 4*.*74 which is larger than 0*.*0197, the hypothesis *Ho*<sup>3</sup> of level profile in both the populations, given that the population profiles are coincident, is not rejected at 5% level of significance.

#### **14.5. Generalizations to** *k* **Populations**

Let the *p* × 1 real vectors *X*1*,...,Xk* be independently distributed where *Xj* ∼ *Np(Mj , Σ), Σ > O, j* = 1*,... , k, Σ* being the same for all the populations. Consider the *(p* − 1*)* × *p* array *C*, as previously defined in Sect. 14.2, whose rank is *p* − 1. Letting *Zj* = *CXj , j* = 1*,...,k*, *Zj iid* ∼ *Np*−<sup>1</sup>*(CMj ,CΣC*- *), (CΣC) > O, j* = 1*,...,k*. Then, testing for "parallel profiles" in these *k* populations is equivalent to testing the hypothesis

$$H\_{o1}: CM\_1 = CM\_2 = \dots = CM\_k. \tag{14.5.1}$$

Testing for "coincident profiles" in all the *k* populations is tantamount to testing the hypothesis

$$H\_{o2}: M\_1 = M\_2 = \dots = M\_k. \tag{14.5.2}$$

Testing for "level profiles", given that the profiles are coincident, in all these *k* populations, is the same as testing the hypothesis

$$H\_{o\Im} : CM\_1 = CM\_2 = \dots = CM\_k = O. \tag{14.5.3}$$

Note that *Ho*1*, Ho*2*, Ho*<sup>3</sup> are all tests involving equality of mean value vectors in multivariate populations where, in *Ho*<sup>1</sup> and *Ho*3*,*, we have *(p* − 1*)*-variate Gaussian populations and, in *Ho*2, we have a *p*-variate Gaussian population. In all these cases, the population covariance matrices are identical but unknown. Such tests were considered in Chap. 6 and hence, further discussion is omitted. In all these tests, one point has to be kept in mind. The units of measurements must be the same in all the populations, otherwise there is no point in comparing mean value vectors. If qualitative variables are involved, they should be converted to quantitative variables using the same scale, such as changing all of them to points on a [0*,* 10] scale.

#### **14.6. Growth Curves or Repeated Measures Designs**

For instance, suppose that the growth (measured in terms of height) of a plant seedling (for instance, a variety of corn) is monitored. Let the height be 10 cm at the onset of the observation period, that is, at time *t* = 0, and the measurements be taken weekly. After one week *(t* = 1*)*, suppose that the height is 12 cm. Then 10 = *a*0, and, for a linear function, 12 = *a*<sup>0</sup> + *a*1*t* = *a*<sup>0</sup> + *a*<sup>1</sup> at *t* = 1, so that *a*<sup>0</sup> = 10 and *a*<sup>1</sup> = 2. We now have the points *(*0*,* 10*), (*1*,* 12*)*. Consider a straight line passing through these points. If the third, fourth, etc. measurements taken at *t* = 2*, t* = 3*,* etc. weeks, fall more or less on this line, then we can say that the expected growth is linear and we can consider a model of the type *E*[*x*] = *a*<sup>0</sup> + *a*1*t* to represent the expected growth of this plant, where *E*[·] represents the expected value. If the third, fourth, etc. points are increasing away from the straight line and the behavior is of a parabolic type then, we can put forward the model *<sup>E</sup>*[*x*] = *<sup>a</sup>*<sup>0</sup> <sup>+</sup> *<sup>a</sup>*1*<sup>t</sup>* <sup>+</sup> *<sup>a</sup>*2*t*2. If the behavior appears to be of a polynomial type, then we may posit the model *<sup>E</sup>*[*x*] = *<sup>a</sup>*<sup>0</sup> <sup>+</sup> *<sup>a</sup>*1*<sup>t</sup>* +···+ *am*−1*tm*−1. However, we need not monitor the growth at one time unit intervals. We may monitor it at convenient time units *t*1*, t*2*,....* A polynomial type model leads to the following equations:

$$\begin{aligned} E[\mathbf{x}\_1] &= a\_0 + a\_1 t\_1 + a\_2 t\_1^2 + \dots + a\_{m-1} t\_1^{m-1} \\ E[\mathbf{x}\_2] &= a\_0 + a\_1 t\_2 + a\_2 t\_2^2 + \dots + a\_{m-1} t\_2^{m-1} \\ &\vdots \\ E[\mathbf{x}\_p] &= a\_0 + a\_1 t\_p + a\_2 t\_p^2 + \dots + a\_{m-1} t\_p^{m-1} \end{aligned}$$

where *E*[*x*1]*,...,E*[*xp*] are expected observations at times *t*1*, t*2*,...,tp,* and *a*<sup>0</sup> is the value of the expected observation at *tj* = 0*, j* = 1*,...,p*. Thus, in matrix notation, we have a *p*-vector for the observations. Letting *X*- <sup>1</sup> = [*x*1*, x*2*,...,xp*],

$$E\{X\_1\} = E\begin{bmatrix} x\_1\\ x\_2\\ \vdots\\ x\_p \end{bmatrix} = \begin{bmatrix} 1 & t\_1 & t\_1^2 & \dots & t\_1^{m-1} \\ 1 & t\_2 & t\_2^2 & \dots & t\_2^{m-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & t\_p & t\_p^2 & \dots & t\_p^{m-1} \end{bmatrix} \begin{bmatrix} a\_0\\ a\_1\\ \vdots\\ a\_{m-1} \end{bmatrix} \equiv T\{A\_1,\dots,A\_p\} \tag{14.6.1}$$

where *X*<sup>1</sup> is *p*×1, *T* is *p*×*m* and the parameter vector *A*<sup>1</sup> is *m*×1. In this instance, we have taken observations on the same real scalar variable *x* at *p* different occasions *t*1*,...,tp*. In this sense, it is a "repeated measures design" situation. We have monitored *x* over time in order to characterize the growth of this variable and, in this sense, it is a "growth curve" situation. This *x* may be the weight of a person under a certain weight reduction diet. The successive readings are then expected to decrease, so that it is a negative growth as well as a repeated measures design model. When *x* represents the spread area of a certain fungus, which is monitored over time, we have a growth curve model that depends upon the nature of spread. If *x* is the temperature reading of a feverish person under a certain medication, being recorded over time, we have a growth curve model with an expected negative growth. We can think of a myriad of such situations falling under repeated measures design or growth curve model.

#### **14.6.1. Growth curve model**

The model specified in (14.6.1) can be expressed as follows:

$$E[X\_1] = TA\_1, \ A\_1 = \begin{bmatrix} a\_{10} \\ \vdots \\ a\_{1m-1} \end{bmatrix}, \ T = \begin{bmatrix} 1 & t\_1 & t\_1^2 & \dots & t\_1^{m-1} \\ 1 & t\_2 & t\_2^2 & \dots & t\_2^{m-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & t\_p & t\_p^2 & \dots & t\_p^{m-1} \end{bmatrix},\tag{14.6.2}$$

where we have a *p*-vector *X*<sup>1</sup> with expected value *T A*1, *T* being *p* × *m* and *A*1, *m* × 1. If we are taking a simple random sample of size *n*<sup>1</sup> from this *p*-vector , we may denote the sample values as *X*11*, X*12*,...,X*1*n*<sup>1</sup> and we may say that we have a simple random sample of size *n*<sup>1</sup> from that *p*-variate population or from *n*<sup>1</sup> members belonging to a group whose underlying distribution is that of *X*1, and we may write the model as

$$E[X\_{1j}] = TA\_{\mathrm{l}}, \; j = 1, \ldots, n\_{\mathrm{l}}.\tag{14.6.3}$$

With reference to the weight measurements of a person subjected to a certain diet, if the diet is administered to a group of *n*<sup>1</sup> individuals who are comparable in every respect such as initial weight, age, physical condition, and so on, then the measurements on these *n*<sup>1</sup> individuals are the observations *X*11*,...,X*1*n*<sup>1</sup> . Letting the same diet be administered to another group of *n*<sup>2</sup> persons whose characteristics are similar within the group but may be different from those of the first group, their measurements will be denoted by *X*2*<sup>j</sup> , j* = 1*,...,n*2. Consider *r* such groups. If these groups are men belonging to a certain age bracket and women of the same age category, *r* = 2. If there are 7 groups of women of different age categories and 5 groups of men belonging to certain age brackets in the study, *r* will be equal to 12, and so on. Thus, there can be variations between groups but each group ought to be homogeneous. If there are *r* such groups and a simple random sample of size *ni* is available from group *i, i* = 1*, . . . , r,* we have the model

$$E[X\_{lj}] = TA\_l, \ j = 1, \ldots, n\_l, \ i = 1, \ldots, r. \tag{14.6.4}$$

We may then compare this situation with a one-way classification MANOVA (or ANOVA when a single dependent variable is being monitored) situation involving *r* groups with the *i*-th group being of size is *ni,* so that *n* = *n*<sup>1</sup> + *n*<sup>2</sup> +···+ *nr* = *n.* as per our notation with respect to the one-way layout. There is one set of treatments and we wish to assess the expected effects in these *r* groups. For the analysis of the data being so modeled, we do not explicitly require a subscript to represent the order *p* of the observation vectors (in this case, *p* corresponds to the number of occasions the measurements are made) and hence, our notation (14.6.4) does not involve a subscript representing the order of the vector *Xij* .

The model (14.6.4) can be generalized in various ways. As a second set of treatments, we may consider *s* different diets being administered to *r* distinct groups of *n* individuals each. "Diet" may exhibit another polynomial growth of the type *<sup>b</sup>*<sup>0</sup> <sup>+</sup> *<sup>b</sup>*1*<sup>t</sup>* +···+ *bm*<sup>1</sup> *<sup>t</sup>m*<sup>1</sup> , where *m*<sup>1</sup> need not be equal to *m*−1. Then, within the framework of experimental designs, we have a multivariate two-way layout with an equal number of observations per cell. In this manner, we can obtain multivariate generalizations of all standard designs where the effects are polynomial growth models, possibly of different degrees, with *p*, however, remaining unchanged, *p* being the number of occasions the measurements are made. We can also consider another type of generalization. In the example involving different types of diets, these various types may be the same diet but in different strictness levels. In this instance, the different diets are correlated among themselves. In a two-way layout, there is no correlation within one set of treatments; interaction may be present between the two sets of treatments. When different doses of the same diet is the second set of treatments, correlation is obviously present within this second set of treatments. This leads to multivariate factorial designs with growth curve model for the treatment effects. In all of the above considerations, we were only observing one item, namely weight. In another type of generalization, instead of one item, several items are measured at the same time, that is, there may be several dependent (response) variables instead of one. In this case, if *y*1*,...,yk* are the items measured on the same *p* occasions, then each *yj* may have its own distribution with covariance matrix *Σj , j* = 1*,...,k*. Not only that, there may also be joint dispersions, in which case, there will be covariance matrices *Σij* for all *i, j* = 1*,...,k*. Thus, several types of generalizations are possible for the basic model specified in (14.6.4).

In order to analyze the data with the model (14.6.4), the following assumptions are required:

(1) The *n.* = *n* vectors *Xij* 's are mutually independently distributed for all *i* and *j* with common covariance matrix *Σ>O*.

(2) *E*[*Xij* ] = *T Ai, j* = 1*,...,ni,* for each *i*, *i* = 1*,...,r*.

(3) *Xij* ∼ *Np(T Ai, Σ), Σ > O, j* = 1*,...,ni,* for each *i*, *i* = 1*,...,r*.

The normality assumption is convenient for obtaining maximum likelihood estimates (MLE) and for testing hypotheses on the parameters *Ai, i* = 1*,...,r*. Observe that the matrix *T* is known.

#### **14.6.2. Maximum likelihood estimation**

Under the normality assumption for the *Xij* 's, their joint density, denoted by *L*, is

$$L = \frac{1}{(2\pi)^{\frac{np}{2}} |\Sigma|^{\frac{n}{2}}} \mathbf{e}^{-\frac{1}{2} \sum\_{i,j} (X\_{ij} - TA\_i)' \Sigma^{-1} (X\_{ij} - TA\_i)}.\tag{14.6.5}$$

Note that

$$\sum\_{i,j} (X\_{ij} - TA\_i)' \Sigma^{-1} (X\_{ij} - TA\_i) = \sum\_{i,j} \text{tr} [\Sigma^{-1} (X\_{ij} - TA\_i)(X\_{ij} - TA\_i)' \cdot \Sigma]$$

Let *Mi* = *T Ai*. Then, differentiating ln*L* with respect to *Mi* and equating the result to a null vector yields the following (for vector/matrix derivatives, refer to Chap. 1):

$$\frac{\partial}{\partial M\_i} \sum\_{i,j} (X\_{ij} - M\_i)' \Sigma^{-1} (X\_{ij} - M\_i) = O \Rightarrow$$

$$-2 \sum\_j [\Sigma^{-1} (X\_{ij} - M\_i)] = O \Rightarrow$$

$$\sum\_j (X\_{ij} - M\_i) = O \Rightarrow M\_i = \frac{1}{n} \sum\_j X\_{ij} = \bar{X}\_i. \tag{14.6.6}$$

Thus, the MLE of *Mi* is *M*ˆ*<sup>i</sup>* = *X*¯*i*. At *Mi* = *M*ˆ*i*,

$$\ln L = -\frac{n\_\cdot p}{2} \ln(2\pi) - \frac{n\_\cdot}{2} \ln|\Sigma| - \frac{1}{2} \sum\_{i,j} \text{tr} [\Sigma^{-1} (X\_{ij} - \bar{X}\_i)(X\_{ij} - \bar{X}\_i)']$$

$$= -\frac{n\_\cdot p}{2} \ln(2\pi) - \frac{n\_\cdot}{2} \ln|\Sigma| - \frac{1}{2} \text{tr} [\Sigma^{-1} (S\_1 + \dots + S\_r)]$$

where *Si* is the *i*-th group sample sum of products matrix or sum of squares and cross products matrix for *i* = 1*,...,r*, see also Eq. (13.2.6). Letting *S* = *S*<sup>1</sup> +···+ *Sr* and proceeding as in Chap. 3 by differentiating ln*L* with respect to *Σ*−1, equating the result to a null matrix and simplifying, we obtain the MLE of *Σ* as

$$
\hat{\Sigma} = \frac{1}{n\_\cdot} S = \frac{1}{n\_\cdot} (S\_1 + \dots + S\_r). \tag{14.6.7}
$$

An unbiased estimator of *Σ* is <sup>1</sup> *n.*−*<sup>r</sup> (S*<sup>1</sup> +···+ *Sr)* <sup>=</sup> <sup>1</sup> *n.*−*<sup>r</sup> <sup>S</sup>*. For evaluating the MLE of the parameter vector *Ai*, we can proceed as follows: After substituting <sup>1</sup> *n. S* to *Σ* in ln*L*, consider the following for a specific *i*:

Profile Analysis and Growth Curves

$$\frac{\partial}{\partial A\_i} \ln L = \frac{\partial}{\partial A\_i} \sum\_{i,j} (X\_{ij} - TA\_i)' \left(\frac{S}{n\_.}\right)^{-1} (X\_{ij} - TA\_i) = O \Rightarrow$$

$$T' \left(\frac{S}{n\_.}\right)^{-1} \sum\_j (X\_{ij} - TA\_i) = O \Rightarrow$$

$$T' S^{-1} X\_{i.} - T' S^{-1} T n\_i A\_i = O \Rightarrow$$

$$\hat{A}\_i = (T' S^{-1} T)^{-1} T' S^{-1} \bar{X}\_i, \; i = 1, \ldots, r. \tag{14.6.8}$$

Note that this equation holds if *S*−<sup>1</sup> is replaced by *Σ*−<sup>1</sup> or *Γ* <sup>−</sup><sup>1</sup> for some real positive definite matrix *Γ* . Thus, the maximum of the likelihood function under the general model (14.6.4) is the following:

$$\max L = \frac{e^{-\frac{n\_\cdot p}{2}}}{(2\pi)^{\frac{n\_\cdot p}{2}}|\_{\underline{n}\_\cdot}^{\frac{1}{2}}\sum\_{i,j}(X\_{ij}-\bar{X}\_i)(X\_{ij}-\bar{X}\_i)'|^{\frac{n}{2}}}.\tag{14.6.9}$$

As well, observe that the matrix in the determinant of (14.6.9) is

$$\frac{1}{n\_\cdot} \sum\_{i=1}^r \left[ \sum\_{j=1}^{n\_i} (X\_{ij} - \bar{X}\_i)(X\_{ij} - \bar{X}\_i)' \right] = \frac{1}{n\_\cdot} \times \text{(the residual matrix in MANVA)}$$

in a one-way layout, referring to Sect. 13.2. As well, this quantity is Wishart distributed with *n.* − *r* degrees of freedom and parameter matrix *Σ*, that is,

$$\sum\_{i,j} (X\_{ij} - \bar{X}\_i)(X\_{ij} - \bar{X}\_i)' \sim W\_p(n\_.- - r, \Sigma), \ \Sigma > O. \tag{14.6.10}$$

Now, consider the MLE of *Ai*:

$$
\hat{A}\_i = (T'S^{-1}T)^{-1}T'S^{-1}\bar{X}\_i = C\_1\bar{X}\_i,\ C\_1 = (T'S^{-1}T)^{-1}T'S^{-1}.\tag{14.6.11}
$$

Observe that *n.* cancels out in *A*ˆ*i*. In a normal population, the sample mean and the sample sum of products matrix are independently distributed. Hence, *X*¯*<sup>i</sup>* and *Si* are independently distributed, and thereby *X*¯*<sup>i</sup>* and *S* are independently distributed. Moreover, *E*[*Xij* ] = *E*[*X*¯*i*] = *T Ai* for each *i*, and we have

$$E[\hat{A}\_i] = E\{ (T'S^{-1}T)^{-1}T'S^{-1} \} E[\bar{X}\_i] = E[(T'S^{-1}T)^{-1}T'S^{-1}TA\_i] = A\_i.s.$$

Thus, *A*ˆ*<sup>i</sup>* is an unbiased estimator for *Ai*. So far, we have not made any specification on the number *m* or the structure of *T* . We can incorporate the structure of *T* and the specification on the degree of the polynomial in *T* through a hypothesis. Let

#### Arak M. Mathai, Serge B. Provost, Hans J. Haubold

$$H\_o: T = T\_o = \begin{bmatrix} 1 & t\_1 & t\_1^2 & \dots & t\_1^{m-1} \\ 1 & t\_2 & t\_2^2 & \dots & t\_2^{m-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & t\_p & t\_p^2 & \dots & t\_p^{m-1} \end{bmatrix} \tag{14.6.12}$$

for a specified *m*, that is, *Ho* posits that it is a polynomial *T Ai* of degree *m* − 1 for each *i* and this model fits. In this instance, the condition imposed through *Ho* is that the degree of the polynomial is a specified number *m* − 1 because *p* is known as the order of the population vector. Then, the factors e−*n.p* <sup>2</sup> and *(*2*π ) n.p* <sup>2</sup> cancel out in the likelihood ratio criterion *λ*, leaving

$$\lambda = \frac{\max\_{H\_o} L}{\max L} = \frac{|\sum\_{i,j} (X\_{ij} - \bar{X}\_i)(X\_{ij} - \bar{X}\_i)'|^\frac{n}{2}}{|\sum\_{i,j} (X\_{ij} - T\_o \hat{A}\_i)(X\_{ij} - T\_o \hat{A}\_i)'|^\frac{n}{2}}.\tag{14.6.13}$$

Note that *i,j (Xij* − *ToA*ˆ*i)* = *i,j* [*(Xij* − *X*¯*i)* + *(X*¯*<sup>i</sup>* − *ToA*ˆ*i)*], so that

$$\begin{aligned} \sum\_{i,j} (X\_{ij} - T\_o \hat{A}\_i)(X\_{ij} - T\_o \hat{A}\_i)' &= \sum\_{i,j} (X\_{ij} - \bar{X}\_i)(X\_{ij} - \bar{X}\_i)' \\ &+ \sum\_{i,j} (\bar{X}\_i - T\_o \hat{A}\_i)(\bar{X}\_i - T\_o \hat{A}\_i)'. \end{aligned}$$

Since *λ*|*Ho* is a ratio of determinants, the parameter matrix *Σ* can be eliminated, and we can assume that the normal population has the identity matrix *I* as its covariance matrix. As well, [*Xij* − *X*¯*i*]=[*(Xij* − *T Ai)* − *(X*¯*<sup>i</sup>* − *T Ai)*] and [*X*¯*<sup>i</sup>* − *ToA*ˆ*i*]=[*(X*¯*<sup>i</sup>* − *ToAi)* − *(ToA*ˆ*<sup>i</sup>* − *ToAi)*]. We have already established that *A*ˆ*<sup>i</sup>* is an unbiased estimator for *Ai*. Hence, without any loss of generality, we can assume the population to be *Np(O, I )* under *Ho*, and we can consider

$$\begin{aligned} \bar{X}\_i - T\_o(T\_o' S^{-1} T\_o)^{-1} T\_o' S^{-1} \bar{X}\_i &\equiv (I - B)\bar{X}\_i \\ \bar{X}\_i \sim N\_p(O, \frac{1}{n\_i} I) &\Rightarrow \sqrt{n\_i} \bar{X}\_i \sim N\_p(O, I) \end{aligned}$$

where *B* = *To(T* - *oS*−<sup>1</sup>*To)*−1*T* - *oS*−1. Now, since

$$B^2 = [T\_o(T\_o'\mathbb{S}^{-1}T\_o)^{-1}T\_o'\mathbb{S}^{-1}][T\_o(T\_o'\mathbb{S}^{-1}T\_o)^{-1}T\_o'\mathbb{S}^{-1}] = T\_o(T\_o'\mathbb{S}^{-1}T\_o)^{-1}T\_o'\mathbb{S}^{-1} = B,$$

*B* is idempotent of rank *m* where *m* is equal to the rank of *To*. For convenience, let *p*<sup>1</sup> = *p* − *m* and *p*<sup>2</sup> = *m,* so that *p*<sup>1</sup> + *p*<sup>2</sup> = *p*. Observe that *Xij* − *X*¯*<sup>i</sup>* and *X*¯*<sup>i</sup>* are independently distributed, and that *<sup>X</sup>*¯*<sup>i</sup>* <sup>=</sup> <sup>1</sup> *ni J J* - *Xij* and *Xij* <sup>−</sup> *<sup>X</sup>*¯*<sup>i</sup>* = [*<sup>I</sup>* <sup>−</sup> <sup>1</sup> *ni J J* - ]*Xij*

where *J* - = [1*,* 1*,...,* 1] or *J* is an *ni* × 1 vector of unities. Now, observing that [*<sup>I</sup>* <sup>−</sup> <sup>1</sup> *ni J J* - ][ 1 *ni J J* - ] = *<sup>O</sup>* and that, under normality, <sup>√</sup>*niX*¯*<sup>i</sup>* <sup>∼</sup> *Np(O, I )*, it follows that *U*<sup>1</sup> = *i,j (Xij* − *X*¯*i)(Xij* − *X*¯*i)* and *U*<sup>2</sup> = [*(I* − *B)X*¯*i*][*(I* − *B)X*¯*i*] are independently distributed since *U*<sup>1</sup> and *X*¯*<sup>i</sup>* are independently distributed. Accordingly,

$$w = \lambda^{\frac{2}{\pi\_{\cdot}}} = \frac{|U\_1|}{|U\_1 + U\_2|}. \tag{14.6.14}$$

*.*

Since *I* − *B* is idempotent and of rank *p* − *m* = *p*1, we can perform an orthonormal transformation on <sup>√</sup>*niX*¯*<sup>i</sup>* and, equivalently, write *<sup>U</sup>*<sup>2</sup> as *<sup>U</sup>*<sup>2</sup> <sup>→</sup> *Z O O O* in distribution where *Z* = [*Z*1*,...,Zr*][*Z*1*,...,Zr*] - *, Zj iid* ∼ *Np*<sup>1</sup> *(O, Ip*<sup>1</sup> *)*. Then, *Z* ∼ *Wp*<sup>1</sup> *(r, Ip*<sup>1</sup> *)* for *r* ≥ *p*1. Moreover, we can reduce the *p* × *p* matrix *U*<sup>1</sup> to a *p*<sup>1</sup> × *p*<sup>1</sup> submatrix in the determinant. Let us consider the following partition of *U*<sup>1</sup> where *U*<sup>11</sup> is *p*<sup>1</sup> × *p*<sup>1</sup> and *U*<sup>22</sup> is *p*<sup>2</sup> × *p*2, *p*<sup>1</sup> + *p*<sup>2</sup> = *p*:

$$U\_1 = \begin{bmatrix} U\_{11} & U\_{12} \\ U\_{21} & U\_{22} \end{bmatrix}, \ |U\_1| = |U\_{22}| \ |U\_{11} - U\_{12} U\_{22}^{-1} U\_{21}|,$$

which follows from results included in Chap. 1 on the determinants of partitioned matrices. Then, *w* of (14.6.14) is such that

$$w = \frac{|U\_{11} - U\_{12}U\_{22}^{-1}U\_{21}|}{|(Z + U\_{11}) - U\_{12}U\_{22}^{-1}U\_{21}| }$$

Now, for fixed *<sup>U</sup>*22, make the transformation *<sup>V</sup>*<sup>12</sup> <sup>=</sup> *<sup>U</sup>*12*U*−<sup>1</sup> 2 <sup>22</sup> ⇒ d*V*<sup>12</sup> = |*U*22| <sup>−</sup>*p*<sup>1</sup> <sup>2</sup> d*U*12, referring to Chap. 1. Let us take *p*<sup>1</sup> ≥ *p*2; if not, simply interchange *p*<sup>1</sup> and *p*<sup>2</sup> in the following procedure. Then, make the transformation

$$S = V\_{12} V\_{12}' \Rightarrow \mathrm{d}V\_{12} = \frac{\pi^{\frac{p\_1 p\_2}{2}}}{\Gamma\_{p\_2}(\frac{p\_1}{2})} |S|^{\frac{p\_1}{2} - \frac{p\_2 + 1}{2}} \mathrm{d}S,$$

the Jacobian being given in Sect. 4.3. Observe that *U*<sup>1</sup> has a Wishart density with *n* − *r* degrees of freedom and parameter matrix *Ip* where *n* = *n.* = *n*<sup>1</sup> +···+ *nr*. Letting the density of *U*<sup>1</sup> be denoted by *f (U*1*)*, we have

*f (U*1*)* = *c*|*U*1| *n*−*r* <sup>2</sup> <sup>−</sup>*p*+<sup>1</sup> <sup>2</sup> e−<sup>1</sup> <sup>2</sup> tr*(U*1*)* = *c*|*U*22| *<sup>ρ</sup>*|*U*<sup>11</sup> <sup>−</sup> *<sup>U</sup>*12*U*−<sup>1</sup> <sup>22</sup> *U*21| *<sup>ρ</sup>*e−<sup>1</sup> <sup>2</sup> [tr*(U*11*)*+tr*(U*22*)*] *, ρ* <sup>=</sup> *<sup>n</sup>*−*<sup>r</sup>* <sup>2</sup> <sup>−</sup> *<sup>p</sup>*+<sup>1</sup> <sup>2</sup> *,* = *c*|*U*22| *<sup>ρ</sup>*+*p*<sup>1</sup> <sup>2</sup> |*U*<sup>11</sup> − *V*12*V* - 12| *<sup>ρ</sup>*e−<sup>1</sup> <sup>2</sup> [tr*(U*11*)*+tr*(U*22*)*] = *c <sup>π</sup> <sup>p</sup>*1*p*<sup>2</sup> 2 *Γp*<sup>2</sup> *(p*<sup>1</sup> 2 *)* |*S*| *p*1 <sup>2</sup> <sup>−</sup>*p*2+<sup>1</sup> <sup>2</sup> |*U*22| *<sup>ρ</sup>*+*p*<sup>1</sup> <sup>2</sup> |*U*<sup>11</sup> − *S*| *<sup>ρ</sup>*e−<sup>1</sup> <sup>2</sup> [tr*(U*11*)*+tr*(U*22*)*] = *c <sup>π</sup> <sup>p</sup>*1*p*<sup>2</sup> 2 *Γp*<sup>2</sup> *(p*<sup>1</sup> 2 *)* |*S*| *p*1 <sup>2</sup> <sup>−</sup>*p*2+<sup>1</sup> 2 × |*U*22| *<sup>ρ</sup>*+*p*<sup>1</sup> <sup>2</sup> |*V*22| *<sup>ρ</sup>*e−<sup>1</sup> <sup>2</sup> [tr*(S)*+tr*(V*22*)*+tr*(U*22*)*] *, V*<sup>22</sup> <sup>=</sup> *<sup>U</sup>*<sup>11</sup> <sup>−</sup> *<sup>S</sup>* <sup>=</sup> *<sup>U</sup>*<sup>11</sup> <sup>−</sup> *<sup>U</sup>*12*U*−<sup>1</sup> <sup>22</sup> *U*21*,*

where *c* is the normalizing constant. Now, on integrating out *U*<sup>22</sup> and *S*, we obtain the marginal density of *V*22, denoted by *f*1*(V*22*)*, as

$$f\_1(V\_{22}) = c\_2 |V\_{22}|^{\frac{n-r}{2} - \frac{p\_2}{2} - \frac{p\_1 + 1}{2}} \mathbf{e}^{-\frac{1}{2} \text{tr}(V\_{22})} \tag{14.6.15}$$

where *c*<sup>2</sup> is the normalizing constant. This shows that *V*<sup>22</sup> ∼ *Wp*<sup>1</sup> *(n* − *r* − *p*2*, Ip*<sup>1</sup> *)* for *n.* − *r* − *p*<sup>2</sup> ≥ *p*<sup>1</sup> and that *V*<sup>22</sup> and *Z* are independently distributed. Then,

$$w = \frac{|V\_{22}|}{|V\_{22} + Z|}$$

where *V*<sup>22</sup> and *Z* are *p*<sup>1</sup> ×*p*<sup>1</sup> real matrices that are independently Wishart distributed with common parameter matrix *Ip*<sup>1</sup> and degrees of freedoms *n.* − *r* − *p*<sup>2</sup> and *r,* respectively. Then, *<sup>w</sup>* = |*W*<sup>|</sup> where *<sup>W</sup>* <sup>=</sup> *(V*<sup>22</sup> <sup>+</sup> *Z)*−<sup>1</sup> <sup>2</sup>*V*22*(V*<sup>22</sup> <sup>+</sup> *Z)*−<sup>1</sup> <sup>2</sup> is *p*<sup>1</sup> × *p*<sup>1</sup> real matrix-variate type-1 beta distributed with the parameters *((n.*−*r*−*p*2*)* <sup>2</sup> *, <sup>r</sup>* <sup>2</sup> *)*. Therefore, for an arbitrary *h*, the *h*-th moment of the determinant of *W* is the following:

$$E[w^h] = E[|W|^h] = c\_3 \frac{\Gamma\_{p\_1}(\frac{n\_\cdot - r - p\_2}{2} + h)}{\Gamma\_{p\_1}(\frac{n\_\cdot - r - p\_2}{2} + \frac{r}{2} + h)}\tag{14.6.16}$$

where *<sup>c</sup>*<sup>3</sup> is the normalizing constant such that when *<sup>h</sup>* <sup>=</sup> 0, *<sup>E</sup>*[*wh*] = 1. Since the likelihood ratio criterion is *<sup>λ</sup>* <sup>=</sup> *<sup>w</sup>n.* 2 ,

$$E[\lambda^h] = c\_3 \frac{\Gamma\_{p\_1}(\frac{n\_\cdot - r - p\_2}{2} + \frac{n\_\cdot h}{2})}{\Gamma\_{p\_1}(\frac{n\_\cdot - r - p\_2}{2} + \frac{r}{2} + \frac{n\_\cdot h}{2})}. \tag{14.6.17}$$

As illustrated in earlier chapters, see for example Sect. 13.3, we can express the exact density of *w* in (14.6.16) as a G-function and the exact density of *λ* in (14.6.17) as an H-function. Letting the densities of *w* and *λ* be denoted by *f*2*(w)* and *f*3*(λ)*, respectively, we have

$$f\_2(w) = c\_3 \, G\_{p1, p1}^{p\_1, 0} \left[ w \Big|\_{\frac{n\_- - p\_2}{2} - \frac{j-1}{2}, \, j = 1, \dots, p\_1}^{\frac{n\_- - p\_2}{2} - \frac{j-1}{2}, \, j = 1, \dots, p\_1} \right] \tag{14.6.18}$$

$$f\_3(\lambda) = c\_3 \, H\_{p\_1, p\_1}^{p\_1, 0} \left[ \lambda \Big| \begin{matrix} (\frac{n - p\_2 - j + 1}{2}, \frac{n}{2}), & j = 1, \dots, p\_1 \\ (\frac{n - r - p\_2 - j + 1}{2}, \frac{n}{2}), & j = 1, \dots, p\_1 \end{matrix} \right] \tag{14.6.19}$$

for 0 *<w<* 1 and 0 *<λ<* 1, and zero elsewhere, where the G and H-functions were defined earlier, for example in Chap. 13. For theoretical results on the G- and Hfunctions as well as applications, the reader is, for instance, referred to Mathai (1993) and Mathai, Saxena and Haubold (2010), respectively. We can express the exact density functions *f*2*(w)* and *f*3*(λ)* in terms of elementary functions for special values of *p, m* and *n*. For moderate values of *n, p* and *m*, numerical evaluations can be obtained by making use of the G- and H-functions programs in Mathematica and MAPLE. For large values of *n*, asymptotic results may be invoked.

**Note 14.6.1.** The test statistic *w* can also be expressed as

$$w = \frac{1}{\left| I + V\_{22}^{-\frac{1}{2}} Z V\_{22}^{-\frac{1}{2}} \right|} = \frac{1}{\left| I + W\_2 \right|}, \ W\_2 = V\_{22}^{-\frac{1}{2}} Z V\_{22}^{-\frac{1}{2}},$$

where *W*<sup>2</sup> is a real matrix-variate type-2 beta random variable with the parameters *(n.*−*r*−*p*<sup>2</sup> <sup>2</sup> *, <sup>r</sup>* <sup>2</sup> *)*. As well, *V*<sup>22</sup> ∼ *Wp*<sup>1</sup> *(n.* − *r* − *p*2*, Ip*<sup>1</sup> *), Z* ∼ *Wp*<sup>1</sup> *(r, Ip*<sup>1</sup> *)*, and *V*<sup>22</sup> and *Z* are independently distributed. While Roy's largest root test is based on the largest eigenvalue of *W*2, Hotelling's trace test relies on the trace of *W*2.

#### **14.6.3. Some special cases**

In certain particular cases, one can obtain exact densities in terms of elementary functions. Two such cases will now be considered.

**Case (1)** *p*<sup>2</sup> = *m* = *p* − 1 ⇒ *p*<sup>1</sup> = *p* − *m* = 1*, n.* − *r* − *p*<sup>2</sup> = *n.* − *r* − *p* + 1. Since *p*<sup>1</sup> = 1*,* there will be only one gamma ratio containing *h*, and

$$E[w^h] = c\_3 \frac{\Gamma(\frac{n\_\cdot - r - p + 1}{2} + h)}{\Gamma(\frac{n\_\cdot - p + 1}{2} + h)}, \; \Re(h) > -\frac{n\_\cdot - p - r + 1}{2}, \tag{i}$$

where *c*<sup>3</sup> is the normalizing constant such that the right-hand side is 1 when *h* = 0. Observe that *(i)* is the *h*-th moment of a real scalar type-1 beta random variable and hence, *w* ∼

real scalar type-1 beta *(n.*−*r*−*p*+<sup>1</sup> <sup>2</sup> *, <sup>r</sup>* <sup>2</sup> *)*. Accordingly, *<sup>y</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup>−*<sup>w</sup> <sup>w</sup>* is a real scalar type-2 beta random variable with the parameters *(<sup>r</sup>* <sup>2</sup> *, n.*−*r*−*p*+<sup>1</sup> <sup>2</sup> *)*, so that

$$t\_1 = \frac{n\_\cdot - r - p + 1}{r} \mathbf{y}\_1 \sim F\_{r, n\_\cdot - r - p + 1}, \quad n\_\cdot - r - p + 1 \ge 1.$$

Thus, for large values of this *F* statistic, we reject the null hypothesis that the growth curve has the structure of *To* of degree *m* − 1 as specified in (14.6.12) and that the growth curve model fits. Equivalently, we reject *Ho* when the observed value of *t*<sup>1</sup> is greater than or equal to *Fr,n.*−*r*−*p*+1*,α*, such that *P r*{*Fr,n.*−*r*−*p*+<sup>1</sup> ≥ *Fr,n.*−*r*−*p*+1*, α*} = *α* for a preselected *α*.

**Case (2)** *p*<sup>2</sup> = *m* = *p* − 2 ⇒ *p*<sup>1</sup> = 2, which indicates that the *h*-th moment will contain two gamma ratios involving *h*. More specifically,

$$E[w^h] = c\_3 \frac{\Gamma(\frac{n - r - p + 2}{2} + h)\Gamma(\frac{n - r - p + 2}{2} - \frac{1}{2} + h)}{\Gamma(\frac{n - p + 2}{2} + h)\Gamma(\frac{n - p + 2}{2} - \frac{1}{2} + h)}, \; \Re(h) > \frac{n - p + 2}{2}. \tag{ii}$$

Since the arguments of the gamma functions in *(ii)* differ by <sup>1</sup> <sup>2</sup> , we can combine them by applying the duplication formula for gamma functions, namely

$$
\Gamma(z)\Gamma(z+\frac{1}{2}) = \pi^{\frac{1}{2}}2^{1-2z}\Gamma(2z). \tag{iiii}
$$

Consider *E*[ <sup>√</sup>*w*] *<sup>h</sup>* so that *h* becomes *<sup>h</sup>* <sup>2</sup> . Now, by taking *<sup>z</sup>* <sup>=</sup> *n.*−*r*−*p*+<sup>2</sup> <sup>2</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> <sup>+</sup> *<sup>h</sup>* <sup>2</sup> in the gamma functions containing *h* and *h* = 0 in the constant part, we have

$$E[\sqrt{w}]^h = c\_{31} \frac{\Gamma(n\_\cdot - r - p + 1 + h)}{\Gamma(n\_\cdot - p + 1 + h)}$$

where *<sup>c</sup>*<sup>31</sup> is the corresponding normalizing constant. Thus, <sup>√</sup>*<sup>w</sup>* <sup>∼</sup> real scalar type-1 beta with the parameters *(n.* − *r* − *p* + 1*,r)* for *n.* + 1 *> r* + *p*, and

$$\text{cy}\_2 = \frac{1 - \sqrt{w}}{\sqrt{w}} \sim \text{type-2 beta } (r, n, -r - p + 1), \ t\_2 = \frac{n\_\cdot - r - p + 1}{r} \text{y}\_2 \sim F\_{2r, 2(n\_\cdot - r - p + 1)}.\tag{iv}$$

Accordingly, we reject *Ho* for large values of the *F* statistic given in *(iv)* or equivalently, for an observed value of *t*<sup>2</sup> that is greater than or equal to *F*2*r,*2*(n.*−*r*−*p*+<sup>1</sup>*),α,* such that *P r*{*F*2*r,*2*(n.*−*r*−*p*+1*)* ≥ *F*2*r,*2*(n.*−*r*−*p*+1*), α*} = *α* for a given significance level *α*.

#### **14.6.4. Asymptotic distribution of the likelihood ratio criterion** *λ*

Given the gamma structure appearing in (14.6.17), we can expand the gamma functions for *n.* <sup>2</sup> *(*<sup>1</sup> <sup>+</sup> *h)* → ∞ in the part containing *<sup>h</sup>* and *n.* <sup>2</sup> → ∞ in the constant part wherein *h* = 0*,* by using the first term in the asymptotic expansion of the gamma function or Stirling's formula, namely

$$
\Gamma(z+\delta) \approx \sqrt{(2\pi)} z^{z+\delta-\frac{1}{2}} \mathbf{e}^{-z} \tag{14.6.20}
$$

for |*z*|→∞, when *δ* is a bounded quantity. Then, on expanding each gamma function in (14.6.17) by making use of (14.6.20), we have

$$E[\lambda^h] \to (1+h)^{-\frac{1}{2}r(p\_1)} \Rightarrow E[\mathbf{e}^{(-2\ln\lambda)h}] \to (1-2h)^{-\frac{1}{2}rp\_1} \text{ for } 1-2h > 0,$$

which shows that whenever *n* = *n.* = *n*<sup>1</sup> +···+ *nr* → ∞*,*

$$
\lambda \to \chi^2\_{rp\_1} \tag{14.6.21}
$$

where the *<sup>χ</sup>*<sup>2</sup> is a real scalar chisquare random variable having *rp*<sup>1</sup> <sup>=</sup> *r(p* <sup>−</sup>*m)* degrees of freedom. Hence, for large values of *n* = *n*1+···+*nr*, we reject the null hypothesis *Ho* that the model specified by *<sup>T</sup>*<sup>0</sup> fits, whenever the observed *<sup>λ</sup>* <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *r(p*−*m),α,* with *P r*{*χ*<sup>2</sup> *r(p*−*m)* ≥ *χ*2 *r(p*−*m), α*} = *<sup>α</sup>* for a preselected *<sup>α</sup>*.

**Example 14.6.1.** A certain exercise regimen is prescribed to a random sample of 5 men and 6 women to stabilize their weights to within normal limits. Except for being of different gender, these men and women form homogeneous groups with respect to all factors that may influence weight, such as age and health conditions. The measurements are initial weight minus weight on the reading date. The weights are recorded every 5th day and a 10-day period is taken as one time unit, the observations being made for 20 days or 2 time units. Test the fit of (1) a linear model, (2) a second degree growth curve to the following data, the columns of the matrix *A* being the readings on men, denoted by *Xij* , and the columns of the matrix *B*, the readings on women, denoted by *Yij* :

$$A = \begin{bmatrix} 1 & 0 & 0 & -1 & 0 \\ 0 & 1 & -1 & 0 & 0 \\ -1 & 0 & 1 & -1 & 1 \\ 1 & 2 & 0 & 1 & 1 \end{bmatrix} \equiv \mathbf{X}, \ B = \begin{bmatrix} -1 & 1 & 0 & -1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 & 2 \\ 0 & -1 & -1 & 1 & 1 & 0 \\ -1 & -1 & 1 & 0 & 0 & 1 \end{bmatrix} \equiv \mathbf{Y}.$$

**Solution 14.6.1.** Let the sample average vectors be denoted by *X*¯ for men and *Y*¯ for women, the deviation matrices be denoted as **X***<sup>d</sup>* and **Y***<sup>d</sup>* and the sample sum of products matrices, by *S*<sup>1</sup> and *S*<sup>2</sup> respectively. Then, these items are as follows:

$$
\begin{split}
\bar{X} &= \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \end{bmatrix}, \ \mathbf{X}\_d = \begin{bmatrix} 1 & 0 & 0 & -1 & 0 \\ 0 & 1 & -1 & 0 & 0 \\ -1 & 0 & 1 & -1 & 1 \\ 0 & 1 & -1 & 0 & 0 \end{bmatrix}, \ S\_1 = \mathbf{X}\_d \mathbf{X}\_d' = \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 2 & -1 & 2 \\ 0 & -1 & 4 & -1 \\ 0 & 2 & -1 & 2 \end{bmatrix}, \\
\bar{Y} &= \begin{bmatrix} 0 \\ 1 \\ 0 \\ 0 \end{bmatrix}, \ \mathbf{Y}\_d = \begin{bmatrix} -1 & 1 & 0 & -1 & 1 & 0 \\ 0 & 0 & 0 & -1 & 0 & 1 \\ 0 & -1 & -1 & 1 & 0 \\ -1 & -1 & 1 & 0 & 0 & 1 \end{bmatrix}, \ S\_2 = \mathbf{Y}\_d \mathbf{Y}\_d' = \begin{bmatrix} 4 & 1 & -1 & 0 \\ 1 & 2 & -1 & 1 \\ -1 & -1 & 4 & 0 \\ 0 & 1 & 0 & 4 \end{bmatrix}.
\end{split}
$$

Let *<sup>S</sup>* <sup>=</sup> *<sup>S</sup>*<sup>1</sup> <sup>+</sup> *<sup>S</sup>*2. Consider the determinant *<sup>S</sup>* and its inverse *<sup>S</sup>*−1obtained from the matrix of cofactors as that matrix divided by the determinant since it is a symmetric matrix. Thus,

$$\begin{aligned} S &= S\_1 + S\_2 = \begin{bmatrix} 6 & 1 & -1 & 0 \\ 1 & 4 & -2 & 3 \\ -1 & -2 & 8 & -1 \\ 0 & 3 & -1 & 6 \end{bmatrix}, \quad |S| = 580, \\\ \text{Cov}(S) &= \begin{bmatrix} 104 & -38 & 6 & 20 \\ -38 & 276 & 48 & -130 \\ 6 & 48 & 84 & -10 \\ 20 & -130 & -10 & 160 \end{bmatrix}, \quad S^{-1} = \frac{1}{580} \begin{bmatrix} 104 & -38 & 6 & 20 \\ -38 & 276 & 48 & -130 \\ 6 & 48 & 84 & -10 \\ 20 & -130 & -10 & 160 \end{bmatrix}. \end{aligned}$$

Take the growth matrices for the linear growth and quadratic growth as *T*<sup>1</sup> and *T*2, where

$$T\_1 = \begin{bmatrix} 1 & 1/2 \\ 1 & 1 \\ 1 & 3/2 \\ 1 & 2 \end{bmatrix} \text{ and } \quad T\_2 = \begin{bmatrix} 1 & 1/2 & 1/4 \\ 1 & 1 & 1 \\ 1 & 3/2 & 9/4 \\ 1 & 2 & 4 \end{bmatrix}.$$

Now, let us compute the various quantities that are required to answer the questions:

$$\begin{aligned} &T\_{1}^{'}S^{-1} = \frac{1}{580} \begin{bmatrix} 92 & 156 & 128 & 40\\ 63 & 69 & 157 & 185 \end{bmatrix},\\ &T\_{1}^{'}S^{-1}\bar{X} = \frac{1}{580} \begin{bmatrix} 40\\ 185 \end{bmatrix}, \ T\_{1}^{'}S^{-1}\bar{Y} = \frac{1}{580} \begin{bmatrix} 156\\ 69 \end{bmatrix},\\ &T\_{1}^{'}S^{-1}T\_{1} = \frac{1}{580} \begin{bmatrix} 416 & 474\\ 474 & 706 \end{bmatrix}, \ (T\_{1}^{'}S^{-1}T\_{1})^{-1} = \frac{580}{69020} \begin{bmatrix} 706 & -474\\ -474 & 416 \end{bmatrix},\\ &T\_{1}(T\_{1}^{'}S^{-1}T\_{1})^{-1}T\_{1}^{'}S^{-1} = \frac{1}{69020} \begin{bmatrix} 26390 & 54810 & 18270 & -30450\\ 17690 & 32190 & 20590 & -1450\\ 8990 & 9570 & 22910 & 27550\\ 290 & -13050 & 25230 & 56550 \end{bmatrix}. \end{aligned}$$

We may now evaluate the estimates of the parameter vectors *A*<sup>1</sup> and *A*2, denoted by a hat:

$$\begin{split} \hat{A}\_{1} &= (T\_{1}^{\prime}S^{-1}T\_{1})^{-1}T\_{1}^{\prime}S^{-1}\bar{X} = \frac{1}{69020} \begin{bmatrix} -59450\\ 58000 \end{bmatrix} = \begin{bmatrix} -0.861345\\ 0.840336 \end{bmatrix} = \begin{bmatrix} \hat{a}\_{01} \\ \hat{a}\_{11} \end{bmatrix},\\ \hat{A}\_{2} &= (T\_{1}^{\prime}S^{-1}T\_{1})^{-1}T\_{1}^{\prime}S^{-1}\bar{Y} = \frac{1}{69020} \begin{bmatrix} 77430\\ -45240 \end{bmatrix} = \begin{bmatrix} 1.12185\\ -0.655462 \end{bmatrix} = \begin{bmatrix} \hat{a}\_{02} \\ \hat{a}\_{12} \end{bmatrix}. \end{split}$$

If the hypothesis of linear growth was not rejected, then the models for men and women, respectively denoted by *fx(t)* and *fy(t)* would be the following:

Men: *fx(t)* = −0*.*861345 + 0*.*840336 *t* Women: *fy(t)* = 1*.*12185 − 0*.*655462 *t.*

As well,

$$\begin{aligned} \,^1T\_1\hat{A}\_1 &= T\_1(T\_1'S^{-1}T\_1)^{-1}T\_1'S^{-1}\bar{X} = \frac{1}{69020} \begin{bmatrix} -30450\\ -1450\\ 27500\\ 56650 \end{bmatrix} = \begin{bmatrix} -0.441176\\ -0.0210084\\ 0.39916\\ 0.819328 \end{bmatrix},\\ \,^1T\_1\hat{A}\_2 &= T\_1(T\_1'S^{-1}T\_1)^{-1}T\_1'S^{-1}\bar{Y} = \frac{1}{69020} \begin{bmatrix} 54810\\ 32190\\ 32190\\ 9570\\ -13950 \end{bmatrix} = \begin{bmatrix} 0.794118\\ 0.466387\\ 0.138655\\ 0.138655 \end{bmatrix}. \end{aligned}$$

Let us now compute the sum of products matrix under the linear growth model. The matrix **X** − **T1A**ˆ **<sup>1</sup>** is such that the vector *T*1*A*ˆ<sup>1</sup> is subtracted from each column of the observation matrix **X**. Letting

$$\begin{aligned} G\_{1}G\_{1}^{'} &= \sum\_{j=1}^{n\_{1}} (X\_{1j} - T\_{1}\hat{A}\_{1})(X\_{1j} - T\_{1}\hat{A}\_{1})^{'} \text{ with} \\ G\_{1} &= \mathbf{X} - \mathbf{T}\_{1}\hat{\mathbf{A}}\_{1} = \begin{bmatrix} 1.44118 & 0.441176 & 0.441176 & -0.558824 & 0.441176 \\ 0.0210084 & 1.02101 & -0.978992 & 0.0210084 & 0.0210084 \\ -1.39916 & -0.39916 & 0.60084 & -1.39916 & 0.60084 \\ 0.180672 & 1.18067 & -0.819328 & 0.180672 & 0.180672 \end{bmatrix}, \\ G\_{1}G\_{1}^{'} &= \begin{bmatrix} 2.97318 & 0.04643421 & -0.880499 & 0.398542 \\ 0.0463421 & 2.00221 & -1.04193 & 2.01898 \\ -0.880499 & -1.04193 & 4.796664 & -1.36059 \\ 0.398542 & 2.01898 & -1.36059 & 2.16321 \end{bmatrix}. \end{aligned}$$

We now evaluate the same matrices for the second sample, denoting the deviation matrix by *G*<sup>2</sup> and the sum of products matrix by *G*2*G*- <sup>2</sup>, where *G*<sup>2</sup> is obtained by subtracting *T A*ˆ<sup>2</sup> from each column of the observation matrix **Y**:

$$\begin{split} G\_{2} &= \mathbf{Y} - \mathbf{T}\_{1}\hat{\mathbf{A}}\_{2} \\ &= \begin{bmatrix} -1.79412 & 0.205882 & -0.794118 & -1.79412 & 0.205882 & -0.794118 \\ 0.533613 & 0.533613 & 0.533613 & -0.466337 & 0.533613 & 1.53361 \\ -0.138655 & -1.13866 & -1.13866 & 0.861345 & 0.861345 & -0.138655 \\ -0.810924 & -0.810924 & 1.189092 & 0.189076 & 0.189076 & 1.18908 \end{bmatrix}, \\ G\_{2}G\_{2}' &= \begin{bmatrix} 7.78374 & -1.54251 & -0.339348 & -0.90089 \\ -1.54251 & 3.70846 & -1.44393 & 1.60536 \\ -0.339348 & -1.44393 & 4.11535 & -0.157298 \\ -0.90089 & 1.60536 & -0.157298 & 4.2145 \end{bmatrix}. \end{split}$$

Thus,

$$G\_1G\_1' + G\_2G\_2' = \begin{bmatrix} 10.7569 & -1.49617 & -1.21985 & -0.502348 \\ -1.49617 & 5.71067 & -2.48586 & 3.62434 \\ -1.21985 & -2.48586 & 8.91199 & -1.51788 \\ -0.502348 & 3.62434 & -1.51788 & 6.37771 \end{bmatrix} \text{ and }$$
  $|G\_1G\_1' + G\_2G\_2'| = 1798.49,$ 

so that

$$w = \frac{|S\_1 + S\_2|}{|G\_1 G\_1' + G\_2 G\_2'|} = \frac{580}{1798.49} = 0.322493.$$

In this case, *n.* − *r* = 11 − 2 = 9*, m* = 2*, p* = 4*, p* − *m* = *p*<sup>1</sup> = 2 and *p*<sup>2</sup> = 2. This situation is our special Case (2). Then,

$$\begin{aligned} \mathbf{y}\_2 &= \frac{1 - \sqrt{w}}{\sqrt{w}} = \frac{1 - 0.5679}{0.5679} = 0.7609, \\ \mathbf{t}\_2 &= \frac{(n\_\cdot - r - p + 1)}{r} \mathbf{y}\_2 = \frac{6}{2} \mathbf{y}\_2 = 3 \mathbf{y}\_2 = 2.2826, \end{aligned}$$

which is the observed value of an *F*2*r,*2*(n.*−*r*−*p*+1*)* = *F*4*,*<sup>12</sup> random variable under the null hypothesis. From an F-table, at the 5% level, *F*4*,*12*,*0*.*<sup>05</sup> = 3*.*26 *>* 2*.*2826. In the case of the F-test, we reject the hypothesis for large observed values of the statistic. Accordingly, we do not reject *Ho* that *m* = 2 or that the growth model is linear. Naturally, the null hypothesis would not be rejected either at a lower level of significance.

We now tackle the second part of the problem wherein *m* = 3 and *m* − 1 = 2, that is, the growth model is assumed to be a polynomial of degree 2 under the null hypothesis. Again, we take the time intervals to be *<sup>t</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> *, t*<sup>2</sup> <sup>=</sup> <sup>1</sup>*, t*<sup>3</sup> <sup>=</sup> <sup>3</sup> <sup>2</sup> and *t*<sup>4</sup> = 2*,* so that

$$T\_2 = \begin{bmatrix} 1 & t\_1 & t\_1^2 \\ 1 & t\_2 & t\_2^2 \\ 1 & t\_3 & t\_3^2 \\ 1 & t\_4 & t\_4^2 \end{bmatrix} = \begin{bmatrix} 1 & 1/2 & 1/4 \\ 1 & 1 & 1 \\ 1 & 3/2 & 9/4 \\ 1 & 2 & 4 \end{bmatrix}.$$

We can make use of some of the computations from the first part of the problem, that is, the calculations related to *T*1. Thus, we have

$$\begin{aligned} T\_2'S^{-1} &= \frac{1}{580} \begin{bmatrix} 92 & 156 & 128 & 40 \\ 63 & 69 & 157 & 185 \\ 81.5 & -145.5 & 198.5 & 492.5 \end{bmatrix} \\ T\_2'S^{-1}T\_2 &= \frac{1}{580} \begin{bmatrix} 416 & 474 & 627 \\ 474 & 706 & 1178 \\ 627 & 1178 & 2291.5 \end{bmatrix}, \ |580 \times T\_2'S^{-1}T\_2| = 3532200 \end{aligned}$$

$$\begin{aligned} \text{Cof}(T\_2'S^{-1}T\_2) &= \begin{bmatrix} 230115 & -347565 & 115710 \\ -347565 & 560135 & -192850 \\ 115710 & -192850 & 69020 \end{bmatrix} \\\ (T\_2'S^{-1}T\_2)^{-1} &= \frac{580}{3532200} \begin{bmatrix} 230115 & -347565 & 115710 \\ -347565 & 560135 & -192850 \\ 115710 & -192850 & 69020 \end{bmatrix} \\\ T\_2'S^{-1}\bar{X} &= \frac{1}{580} \begin{bmatrix} 40 \\ 185 \\ 492.5 \end{bmatrix}, \ T\_2'S^{-1}\bar{Y} = \frac{1}{580} \begin{bmatrix} 156 \\ 69 \\ -145.5 \end{bmatrix}. \end{aligned}$$

and

$$\begin{split} \hat{A}\_{1} &= (T\_{2}^{\prime}S^{-1}T\_{2})^{-1}T\_{2}^{\prime}S^{-1}\bar{X} \\ &= \frac{1}{3532200} \left[ \begin{array}{c} 1892250 \\ -5256250 \\ 2943500 \end{array} \right] = \left[ \begin{array}{c} 0.535714 \\ -1.4881 \\ 0.833333 \end{array} \right] = \left[ \begin{array}{c} \hat{a}\_{01} \\ \hat{a}\_{11} \\ \hat{a}\_{21} \end{array} \right], \\ \hat{A}\_{2} &= (T\_{2}^{\prime}S^{-1}T\_{2})^{-1}T\_{2}^{\prime}S^{-1}\bar{Y} \\ &= \frac{1}{3532200} \left[ \begin{array}{c} -4919850 \\ 12488850 \\ -5298300 \end{array} \right] = \left[ \begin{array}{c} -1.39286 \\ 3.53571 \\ -1.5 \end{array} \right] = \left[ \begin{array}{c} \hat{a}\_{02} \\ \hat{a}\_{12} \\ \hat{a}\_{22} \end{array} \right]. \end{split}$$

If the hypothesis of a second degree growth model was not rejected, the model for men, denoted by *fx(t)*, and that for women, denoted by *fy(t)*, would be the following:

Men: *fx(t)* <sup>=</sup> <sup>0</sup>*.*<sup>535714</sup> <sup>−</sup> <sup>1</sup>*.*<sup>4881</sup> *<sup>t</sup>* <sup>+</sup> <sup>0</sup>*.*<sup>83333</sup> *<sup>t</sup>*2*,* Women: *fy(t)* = −1*.*<sup>39286</sup> <sup>+</sup> <sup>3</sup>*.*<sup>53571</sup> *<sup>t</sup>* <sup>−</sup> <sup>1</sup>*.*<sup>5</sup> *<sup>t</sup>*2*.*

Employing notations paralleling those utilized in the first part, we have

$$\begin{aligned} \,^2T\_2\hat{A}\_1 &= T\_2(T\_2'S^{-1}T\_2)^{-1}T\_2'S^{-1}\bar{X} = \frac{1}{3532200} \begin{bmatrix} 0\\ -420500\\ 630750\\ 3153750 \end{bmatrix} = \begin{bmatrix} 0\\ -0.119048\\ 0.178571\\ 0.892857 \end{bmatrix},\\ \,^2T\_2\hat{A}\_2 &= T\_2(T\_2'S^{-1}T\_2)^{-1}T\_2'S^{-1}\bar{X} = \frac{1}{3532200} \begin{bmatrix} 0\\ 0\\ 1892857\\ 1892250\\ -1135350 \end{bmatrix} = \begin{bmatrix} 0\\ 0.642857\\ 0.535714\\ -0.321429 \end{bmatrix}.\end{aligned}$$

as well as the following sample deviation matrices and sample sum of products matrices:

$$\begin{aligned} G\_1 &= \mathbf{Y} - \mathbf{T}\_2 \mathbf{A}\_1 \\ &= \begin{bmatrix} -1 & 1 & 0 & -1 & 1 & 0 \\ 1.11905 & 1.11905 & 1.11905 & 0.119048 & 1.11905 & 2.11905 \\ -0.178571 & -1.17857 & -1.17857 & 0.821429 & 0.821429 & -0.178571 \\ -1.89286 & -1.89286 & 0.107143 & -0.892857 & -0.892857 & 0.107143 \end{bmatrix}, \\ G\_1 G\_1' &= \begin{bmatrix} 4. & 1. & -1. & 0 \\ 1. & 9.51361 & -2.19898 & -4.9949 \\ -1. & -2.19898 & 4.19133 & 0.95663 \\ 0. & -4.9949 & 0.95663 & 8.78316 \end{bmatrix}; \end{aligned}$$

$$\begin{aligned} G\_{2} &= \mathbf{Y} - \mathbf{T}\_{2}\mathbf{A}\_{2} \\ &= \begin{bmatrix} -1 & 1 & 0 & -1 & 1 & 0 \\ 0.357143 & 0.357143 & 0.357143 & -0.642887 & 0.357143 & 1.35714 \\ -0.535714 & -1.53571 & -1.53571 & 0.464286 & 0.464286 & -0.535714 \\ -0.678571 & -0.678571 & 1.32143 & 0.321429 & 0.321429 & 1.32143 \end{bmatrix}, \\\ G\_{2}G\_{2}' &= \begin{bmatrix} 4. & 1. & -1. & 0. \\ 1. & 2.76531 & -2.14796 & 1.68878 \\ -1. & -2.14796 & 5.72194 & -1.03316 \\ 0. & 1.68878 & -1.03316 & 4.6199 \end{bmatrix}; \end{aligned}$$

$$G\_1 G\_1' + G\_2 G\_2' = \begin{bmatrix} 8. & 2. & -2. & 0. \\ 2. & 12.2789 & -4.34694 & -3.30612 \\ -2. & -4.34694 & 9.91326 & -0.0765339 \\ 0. & -3.30612 & -0.0765339 & 13.4031 \end{bmatrix} \text{ and }$$

$$|G\_1 G\_1' + G\_2 G\_2'| = 9462.7797.$$

Thus, for this part, the statistic *w* obtained from the *λ*-criterion, is the following:

$$w = \frac{|S\_1 + S\_2|}{|G\_1 G\_1' + G\_2 G\_2'|} = \frac{580}{9462.7797} = 0.0612928.7$$

In this case, *m* = 3*, p* = 4*, r* = 2*, p*<sup>1</sup> = 1*, p*<sup>2</sup> = 3 and *n.* − *r* − *p* + 1 = 11 − 2 − 4 + 1 = 6,

$$\begin{aligned} \mathbf{y}\_{\mathrm{l}} &= \frac{1-w}{w} = 15.3151, \ t\_{\mathrm{l}} = \frac{n\_{\cdot}-r-p+1}{r} \mathbf{y}\_{\mathrm{l}} = \frac{6}{2} \,(15.3151) = 45.945, \\ \text{and} \quad F\_{2,6,0.01} &= 10.92. \end{aligned}$$

When implementing F-tests, we reject the null hypothesis for large observed value of the statistic. Since 45*.*945 *> F*2*,*6*,*0*.*01, the hypothesis that a second degree polynomial fits is rejected at the 1% significance level and, of course, at any higher level. Let us consider the asymptotic results. Since *n.* = 11 is not large, the results ought to be interpreted with some caution. In the case of the linear model *(m* = 2*)*, *rp*<sup>1</sup> = 2*(*2*)* = 4 and the tabulated critical values are *χ*<sup>2</sup> <sup>4</sup>*,*0*.*<sup>05</sup> <sup>=</sup> <sup>9</sup>*.*49 and *<sup>χ</sup>*<sup>2</sup> <sup>4</sup>*,*0*.*<sup>01</sup> = 13*.*26. With this chisquare test, we reject *Ho* for large observed value of the statistic. In the linear case, *w* = 0*.*3225 so that −2 ln *λ* = −2*(n./*2*)*ln *w* = −11 ln *w* = −11[−1*.*1317] = 12*.*45. Hence, according to the approximate asymptotic chisquare test, the hypothesis of a linear fit should be rejected at the 5% significance level but not at the 1% significance level. In the second degree polynomial growth curve case, *<sup>p</sup>*<sup>1</sup> <sup>=</sup> 1 so that *rp*<sup>1</sup> <sup>=</sup> 2, and the tabulated values are *<sup>χ</sup>*<sup>2</sup> <sup>2</sup>*,*0*.*<sup>05</sup> = 5*.*99 and *χ*<sup>2</sup> <sup>2</sup>*,*0*.*<sup>01</sup> = 9*.*49. Since −2 ln *λ* = −11 ln *w* = −11 ln 0*.*0613 = −11[−2*.*79] = 30*.*71, according to the approximate asymptotic chisquare test, the hypothesis of a second degree polynomial fit ought to be rejected at both the 5 and 1% levels, which corroborates the conclusion obtained with the F test. In sum, there is some evidence of a linear fit but no indication whatsoever of a quadratic fit.

#### **14.6.5. A general structure**

Potthoff and Roy (1964) considered a general format whereby several hypotheses can be tested at the same time by making use of the structure. When applied to the model specified in (14.6.4), the Potthoff-Roy procedure yields the following format:

$$E[X\_{11}, X\_{12}, \dots, X\_{1n\_1}] = [TA\_1, TA\_1, \dots, TA\_1] = T(A\_1, \dots, A\_1) = TA\_1J\_1' \tag{14.6.22}$$

where *J*<sup>1</sup> is a *n*<sup>1</sup> × 1 vector of unities. The left and right-hand sides of this equation are *p*×*n*<sup>1</sup> matrices where *T* is *p*×*m* and the *m*×1 vector *A*<sup>1</sup> is repeated *n*<sup>1</sup> times. The sample matrix of all *r* groups of *n*<sup>1</sup> +···+ *nr* = *n.* = *n* individuals, denoted by a boldfaced **X**, is the following:

$$E[\mathbf{X}] = [TA\_1J\_1', \dots, TA\_rJ\_r'] = T[AJ\_1', \dots, A\_rJ\_r'] = TAJ\_{(r)}, \ A = [A\_1, \dots, A\_r] \tag{14.6.23}$$

where *J(r)* is an *m*×*(n*1+· · ·+*nr)* = *m*×*n* matrix whose first *n*<sup>1</sup> columns all have unities in the first row (or *J* - <sup>1</sup>) and zeros as the other elements or in the first *n*1×*n*<sup>1</sup> diagonal block, the first row elements are all unities, in the next *n*<sup>2</sup> × *n*<sup>2</sup> diagonal block, the first row elements are all unities and the rest, all zeros, and so on, up to the last *nr*×*nr* diagonal block wherein the first row are all unities and the remaining elements are zeros, all non-diagonal blocks being null matrices. We can also pre-multiply **X** by a *q* ×*p* constant matrix *Q* such that *Q* is of full rank *q* ≤ *p*. Then *QXij* ∼ *Nq (Q T Ai, QΣQ*- *), j* = 1*,...,ni, i* = 1*,...,r*. In this instance, we are making repeated measurements at *q* points *t*1*,...,tq* , instead of the *p* points previously considered, and the model is the following:

$$E[Q\mathbf{X}] = \mathcal{Q}TAJ\_{(r)}.\tag{14.6.24}$$

This model enables one to test all types of hypotheses of the form

$$CAV = O\tag{14.6.25}$$

where *C* and *V* are given matrices, *C* is *s*×*m, A* is the *m*×*r* matrix of parameters and *V* is *r* ×*t* for some *r* and *t*. For example, let *C* = *Im, V* be an *r* ×*(r* −1*)* matrix where the first *r* − 1 columns form an identity matrix and the last column is *(*−1*,* −1*,...,* −1*)*- . Then, *A*<sup>1</sup> − *Ar* = *O, A*<sup>2</sup> − *Ar* = *O,... , Ar*−<sup>1</sup> − *Ar* = *O*, or equivalently, *A*<sup>1</sup> = ··· = *Ar*. For testing the equality of the models *T A*<sup>1</sup> = *T A*<sup>2</sup> =··· = *T Ar*, one need not apply the above procedure. One can directly employ the multivariate one-way layout analysis with *r* − 1 degrees freedom associated with the sum of squares and cross products matrix due to the general nature of the model.

#### **Exercises 14**

**14.1.** The following are observed random samples of sizes *n*<sup>1</sup> = 4 and *n*<sup>2</sup> = 5 from two independent real Gaussian populations, *N*3*(M*1*,Σ)* and *N*3*(M*2*,Σ)* sharing the same covariance matrix *Σ>O* and having the mean value vectors *M*<sup>1</sup> and *M*2, respectively. Test the hypothesis at the 5% significance level that the mean values *M*<sup>1</sup> and *M*<sup>2</sup> have (1): parallel profiles; (2): coincident profiles:

$$\text{Sample 1:} \left[ \begin{array}{cccccc} 1 & 2 & 4 & 1 \\ -1 & 1 & 2 & 2 \\ 1 & 3 & 1 & 3 \end{array} \right], \quad \text{Sample 2:} \left[ \begin{array}{cccccc} 2 & -1 & 1 & -2 & 0 \\ 1 & 1 & 4 & 1 & 3 \\ 4 & 2 & 3 & 0 & 1 \end{array} \right].$$

**14.2.** The following are observed samples of sizes 5 and 6 from two independent real Gaussian populations with the same covariance matrix. The columns of the matrices represent the observations. Test the hypothesis at the 5% level of significance that the expected values or population mean value vectors have (1): parallel profiles; (2): coincident profiles:

$$\text{Sample 1:} \quad \left[ \begin{array}{cccccc} 1 & -1 & 2 & 3 & 0 \\ 1 & 2 & 1 & 1 & 4 \\ -1 & 3 & -1 & -3 & 0 \\ 1 & 1 & 2 & -1 & 2 \end{array} \right], \quad \text{Sample 2:} \left[ \begin{array}{cccccc} 2 & 1 & 3 & -1 & 1 & 0 \\ 1 & 0 & 5 & 2 & 2 & 2 \\ -1 & 1 & 2 & -2 & 4 & -4 \\ 1 & 4 & 5 & 3 & -1 & 0 \end{array} \right],$$

**14.3.** In Exercise 14.1, determine whether the profile is level (1): in the first population; (2): in the second population; (3): in both the populations.

**14.4.** Repeat Exercise 14.3 with the sample of observations provided in Exercise 14.2.

**14.5.** Two breeds of 12 cows having identical characteristics are given a particular type of feed and their weights are monitored for four weeks. The observations are the weight readings on the date minus the initial weight. The weight readings are made at the end of every week for four successive weeks. The columns in *A* represent successive observations on 5 cows of breed-1 and the columns of *B*, successive observations on 7 cows of breed-2. Test the hypothesis at the 5% significance level that the breed effect is a polynomial growth curve of (1): degree 1, (2): degree 2 for each breed:

$$A = \begin{bmatrix} 0 & 1 & 0 & 1 & 3 \\ 1 & 2 & 1 & 2 & 4 \\ 2 & 4 & 2 & 3 & 4 \\ 2 & 6 & 2 & 4 & 6 \end{bmatrix}, \quad B = \begin{bmatrix} 1 & 0 & 1 & 1 & 0 & 3 & 0 \\ 2 & 1 & 1 & 3 & 2 & 3 & 2 \\ 3 & 1 & 2 & 5 & 2 & 4 & 4 \\ 4 & 2 & 2 & 7 & 3 & 6 & 4 \end{bmatrix}.$$

#### **References**

A.M. Mathai (1993): *A Handbook of Generalized Special Functions for Statistical and Physical Sciences*, Oxford University Press, Oxford.

A.M. Mathai and H.J. Haubold (2017): *Probability and Statistics: A Course for Physicists and Engineers*, De Gruyter, Germany.

A.M. Mathai, R.K. Saxena and H.J. Haubold (2010): *The H-function: Theory and Applications*, Springer, New York.

R.F. Potthoff and S.N. Roy (1964): A generalized multivariate analysis of variance model useful especially for growth curve problems, *Biometrika*, **51**(3), 313–326.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 15 Cluster Analysis and Correspondence Analysis**

### **15.1. Introduction**

We will employ the same notations as in the previous chapters. Lower-case letters *x, y,...* will denote real scalar variables, whether mathematical or random. Capital letters *X, Y, . . .* will be used to denote real matrix-variate mathematical or random variables, whether square or rectangular matrices are involved. A tilde will be placed on top of letters such as *x,*˜ *y,*˜ *X,* ˜ *Y*˜ to denote variables in the complex domain. Constant matrices will for instance be denoted by *A, B, C*. A tilde will not be used on constant matrices unless the point is to be stressed that the matrix is in the complex domain. The determinant of a square matrix *A* will be denoted by |*A*| or det*(A)* and, in the complex case, the absolute value or modulus of the determinant of *A* will be denoted as |det*(A)*|. When matrices are square, their order will be taken as *p* × *p*, unless specified otherwise. When *A* is a full rank matrix in the complex domain, then *AA*∗ is Hermitian positive definite where an asterisk designates the complex conjugate transpose of a matrix. Additionally, d*X* will indicate the wedge product of all the distinct differentials of the elements of the matrix *X*. Thus, letting the *p* × *q* matrix *X* = *(xij )* where the *xij* 's are distinct real scalar variables, <sup>d</sup>*<sup>X</sup>* = ∧*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> <sup>∧</sup>*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>d</sup>*xij* . For the complex matrix *<sup>X</sup>*˜ <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup> *iX*2*, i* <sup>=</sup> <sup>√</sup>*(*−1*)*, where *<sup>X</sup>*<sup>1</sup> and *X*<sup>2</sup> are real, d*X*˜ = d*X*<sup>1</sup> ∧ d*X*2.

#### **15.1.1. Clusters**

A cluster means a group or a cloud of items close together with reference to one or more characteristics. For instance, in a countryside, there are villages which are clusters of houses. In a city, there are clusters of high-rise buildings or clusters of apartment blocks. If we have 2-dimensional data points marked on a sheet of paper, then there may be several places where the points are grouped together in large crowds, at other places the points may be bunched together in smaller clumps and somewhere else, there may be singleton points. In a classification problem, we have a number of preassigned populations and we

<sup>©</sup> The Author(s) 2022

A. M. Mathai et al., *Multivariate Statistical Analysis in the Real and Complex Domains*, https://doi.org/10.1007/978-3-030-95864-0 15

want to assign a point at hand to one of those populations. This cannot be achieved in the context of cluster analysis as we do not know beforehand how many clusters there are in the data at hand or which data point belongs to which cluster. Cluster analysis is akin to pattern recognition whereas classification is a sort of taxonomy. Suppose that a new plant is to be classified as belonging to one of the known species of plants; if it does not fall into any of the known species, then we have a member from a new species. In cluster analysis, we are, in a manner of speaking, going to create various 'species'. To start with, we have only a cloud of items and we do not know how many categories or clusters there exist.

Cluster analysis techniques are widely utilized in many fields such as psychiatry, sociology, anthropology, archeology, medicine, criminology, engineering and geology, to mention only a few areas. If real scalar variables are to be classified as belonging to a certain category, one way of achieving this is to ascertain their joint dispersion or joint variation as measured in terms of scale-free covariance or correlation. Those variables that are similarly correlated may be grouped together.

We will consider the problem of cluster analysis involving *n* points *X*1*,...,Xn* where each *Xj* is a real *p*-dimensional vector, that is, we have a *p* × *n* data matrix

$$\mathbf{X} = [X\_1, X\_2, \dots, X\_n] = \begin{bmatrix} x\_{11} & x\_{12} & \dots & x\_{1n} \\ x\_{21} & x\_{22} & \dots & x\_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x\_{p1} & x\_{p2} & \dots & x\_{pn} \end{bmatrix} . \tag{15.1.1}$$

#### **15.1.2. Distance measures**

Two real *p*-vectors are close together if the "distance" between them is small. Many types of distance measures can be defined. Let *Xr* and *Xs* be two real *p*-vectors. These are the *r*-th and *s*-th members or columns in the data matrix (15.1.1). Then, the following are some distance measures:

$$d\_m(X\_r, X\_s) = \left[\sum\_{i=1}^p |\mathbf{x}\_{ir} - \mathbf{x}\_{is}|^m\right]^{\frac{1}{m}};$$

for *<sup>m</sup>* <sup>=</sup> <sup>2</sup>*,* we have the Euclidean distance *<sup>d</sup>*2*(Xr, Xs)* = [*<sup>p</sup> <sup>i</sup>*<sup>=</sup> <sup>|</sup>*xir* <sup>−</sup>*xis*<sup>|</sup> 2] 1 <sup>2</sup> , or, denoting *d*2 <sup>2</sup> as *<sup>d</sup>*2, we have

$$d^2(X\_r, X\_s) = \sum\_{i=1}^p (\mathbf{x}\_{ir} - \mathbf{x}\_{is})^2,\tag{15.1.2}$$

where the absolute value sign can be replaced by parentheses since we are dealing with real elements. We will utilize this convenient quantity *d*<sup>2</sup> for comparing observation vectors. There may be joint variation or covariances among the coordinates in each of the vectors, in which case, Cov*(Xr)* = *Σ>O*. If all the *Xj* 's, *j* = 1*, . . . , n,* have the same covariance matrix, then Cov*(Xj )* = *Σ, j* = 1*,...,n*, and a statistician might wish to consider the generalized distance between *Xr* and *Xs*, or its square, *d*2 *(g)(Xr, Xs)* = *(Xr* − *Xs)*- *<sup>Σ</sup>*−<sup>1</sup>*(Xr* <sup>−</sup> *Xs)*, the subscript *<sup>g</sup>* designating the generalized distance. Since *Σ* is unknown, we may wish to estimate it. However, if there are clusters, it may not be appropriate to make use of the entire data set of all *n* points, since the joint variation or the covariance within each cluster is likely to be different. And as we do not know beforehand whether clusters are present, securing a proper estimate of *Σ* turns out to prove problematic. As a result, this problem is usually circumvented by resorting to the ordinary Euclidean distance instead of the generalized distance.

Let us examine the effect of scaling a vector. If the unit of measurement in one vector is changed, what will be the effect on the squared distance? Consider the following vectors:

$$X\_1 = \begin{bmatrix} -1 \\ 0 \\ -2 \end{bmatrix} \text{ and } \begin{aligned} X\_2 = \begin{bmatrix} -3 \\ 2 \\ 4 \end{bmatrix} &\Rightarrow d^2(X\_1, X\_2) = (X\_1 - X\_2)^\prime (X\_1 - X\_2) \\ &= [(-1) - (-3)]^2 + [(0) - (2)]^2 + [(-2) - (4)]^2 = 44.7 \end{aligned}$$

The squared distances between the vectors when (1) *X*<sup>1</sup> is multiplied by 2; (2) *X*<sup>2</sup> is multiplied by 2; (3) *X*<sup>1</sup> and *X*<sup>2</sup> are each multiplied by 2, are

$$\begin{aligned} d^2(2X\_1, X\_2) &= (-2 + 3)^2 + (0 - 2)^2 + (-4 - 4)^2 = 69 \\ d^2(X\_1, 2X\_2) &= (-1 + 6)^2 + (0 - 4)^2 + (-2 - 8)^2 = 141 \\ d^2(2X\_1, 2X\_2) &= 4[(X\_1 - X\_2)'(X\_1 - X\_2)] = 4 \times 44 = 176. \end{aligned}$$

Note that they are fully distorted as 69 = 4*(*44*)* and 141 = 4*(*44*)*. Thus, the scaling of individual vectors can fully alter the nature of the clusters when there are clusters in the original data. As well, members of the original clusters need not be members of the same clusters in the scaled data and the number of clusters may also change. Accordingly, it is indeed inadvisable to make use of the generalized distance. Nor is re-scaling the individual vectors a good idea if we are seeking clusters. Accordingly, the recommended procedure consists of utilizing the original data without modifying them. It may also happen that the components in each *p*-vector are recorded in different units of measurement. Then, how

to eliminate the location and scale effect on the components in each vector? This can be achieved by standardizing them individually, that is, by subtracting the average value of the components from the components of each vector and dividing the result by the sample standard deviation. Let us see what happens in the case of our numerical example. Letting *<sup>x</sup>*¯<sup>1</sup> and *<sup>x</sup>*¯<sup>2</sup> be the averages of the components in *<sup>X</sup>*<sup>1</sup> and *<sup>X</sup>*2, and *<sup>s</sup>*<sup>2</sup> <sup>1</sup> and *<sup>s</sup>*<sup>2</sup> <sup>2</sup> be the associated sums of products, we have

$$\begin{aligned} \bar{x}\_1 &= \frac{1}{3}[(-1) + (0) + (-2)] = -1, \; \bar{x}\_2 = \frac{1}{3}[(-3) + (2) + (4)] = 1, \\ s\_1^2 &= \sum\_{i=1}^p (x\_{i1} - \bar{x}\_1)^2 = [(-1) - (-1)]^2 + [(0) - (-1)]^2 + [(-2) - (-1)]^2 = 2, \\ s\_2^2 &= \sum\_{i=1}^p (x\_{i2} - \bar{x}\_2)^2 = 26. \end{aligned}$$

Thus, the standardized vectors *X*<sup>1</sup> and *X*2, denoted by *Y*<sup>1</sup> and *Y*2, are the following:

$$Y\_1 = \frac{\sqrt{3}}{\sqrt{2}} \begin{bmatrix} -1 - (-1) \\ 0 - (-1) \\ -2 - (-1) \end{bmatrix} = \frac{\sqrt{3}}{\sqrt{2}} \begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix} \text{ and } \quad Y\_2 = \frac{\sqrt{3}}{\sqrt{26}} \begin{bmatrix} -4 \\ 1 \\ 3 \end{bmatrix}.$$

and *<sup>d</sup>*2*(Y*1*, Y*2*)* <sup>=</sup> *(Y*1−*Y*2*)*- *(Y*1−*Y*2*)* = 7*.*6641*.* However, *Y*<sup>1</sup> and *Y*<sup>2</sup> are very distorted and the distance between *X*<sup>1</sup> and *X*<sup>2</sup> is also modified. Hence, such procedures will change the clustering aspect as well, with new clusters possibly differing from the original clusters.

Let us consider the matrix of squared distances, denoted by *D*:

$$D = \begin{bmatrix} 0 & d\_{12}^2 & \dots & d\_{1n}^2 \\ d\_{21}^2 & 0 & \dots & d\_{2n}^2 \\ \vdots & \vdots & \ddots & \vdots \\ d\_{n1}^2 & d\_{n2}^2 & \dots & d\_{nn}^2 \end{bmatrix} = D'. \tag{15.1.3}$$

For example, letting

$$X\_1 = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \ X\_2 = \begin{bmatrix} 2 \\ 1 \\ 3 \end{bmatrix}, \ X\_3 = \begin{bmatrix} 0 \\ 1 \\ 2 \end{bmatrix} \text{ and } \ X\_4 = \begin{bmatrix} 3 \\ 1 \\ 2 \end{bmatrix}.$$

we have *d*<sup>2</sup> <sup>12</sup> <sup>=</sup> *(*<sup>1</sup> <sup>−</sup> <sup>2</sup>*)*<sup>2</sup> <sup>+</sup> *(*<sup>0</sup> <sup>−</sup> <sup>1</sup>*)*<sup>2</sup> <sup>+</sup> *(*−<sup>1</sup> <sup>−</sup> <sup>3</sup>*)*<sup>2</sup> <sup>=</sup> 18, *<sup>d</sup>*<sup>2</sup> <sup>13</sup> <sup>=</sup> <sup>11</sup>*, d*<sup>2</sup> <sup>14</sup> <sup>=</sup> 14, *<sup>d</sup>*<sup>2</sup> 23 = 5*, d*<sup>2</sup> <sup>24</sup> <sup>=</sup> 2, *<sup>d</sup>*<sup>2</sup> <sup>34</sup> = 9, so that

$$D = \begin{bmatrix} 0 & 18 & 11 & 14 \\ 18 & 0 & 5 & 2 \\ 11 & 5 & 0 & 9 \\ 14 & 2 & 9 & 0 \end{bmatrix}.$$

The question of interest is the following: Given a set of *n* vectors of order *p*, how can one determine the number of clusters and then, classify them into these clusters?

### **15.2. Different Methods of Clustering**

The main methods are hierarchical in nature, the other ones being non-hierarchical. We will begin with non-hierarchical techniques. In this category, the most popular one involves optimization or partitioning.

#### **15.2.1. Optimization or partitioning**

With this approach, we have to come up with two numbers: *k*, a probable number of clusters, and *r*, the maximum separation between the members of each prospective cluster. Based on the distances or on the dissimilarity matrix, *D*, one should be able to determine the likely number of clusters, that is, *k*. Then, one has to find a set of *k* vectors among the *n* given vectors, which will be taken as seed members or starting members within the *k* potential clusters. Several methods have been proposed for determining this *k*, including the following:

1. Examine the closeness of the original vectors as indicated by the dissimilarity matrix *D* and, to start with, decide on an initial numbers for *k* and the likely distance between members within a cluster denoted by *r*.

2. Examine the original data points or original *p*-vectors and, based on the comparative magnitudes of the components of the observed *p*-vectors, ascertain whether there is any grouping possible and predict a value for each of *k* and *r*.

3. Evaluate the sample sum of products matrix *S* from the original data matrix. Compute the two main principal components associated with this *S*. Substitute *Xj* , the *j* -th observation vector, in the two principal components. This provides a pair of numbers or one point in a two-dimensional space. Compute *n* such points for *j* = 1*,...,n*. Plot these points. From the graph, assess the clustering pattern, the number *k* of possible clusters, estimates for *r*, the maximum distance between two members within a cluster as well as the minimum distance between the clusters.

4. Choose any number *k*, select *k* vectors at random from the set of *n* vectors; then, preselect a number *r* and use it as a measure of maximum separation between vectors.

5. Take any number *k* and select as seed vectors the first *k* vectors whose separation is at least two units among the set of *n* vectors.

6. Look at the farthest points. Select *k* of them that are separated by at least *r* units for preselected values of *k* and *r*.

If the dissimilarity matrix *D* is utilized, then the separation number *r* must be measured in *d*<sup>2</sup> *ij* units, whereas *r* should be in *dij* units if the actual distances *dij* are used. After the seed vectors are selected, the remaining *n* − *k* points are to be associated to these seed points to form clusters. Assign the vectors closest to each of the seed vectors and form the initial *k* clusters of two or more vectors. For example, if there are three closest members at equal distance to a seed vector then, that cluster comprises 4 members, including the seed vector. Then, compute the centroids of all initial clusters. The centroid of a cluster is the simple average of the vectors included in that cluster. Thus, the centroid is a *p*-vector. Then, measure the distances of all the points belonging to the same cluster from each centroid, and incorporate all points within the distance of *r* from a centroid to that cluster. This process will create the second stage of *k* clusters. Now, evaluate the centroid of each of these *k* clusters. Again, repeat the process of computing the distances of all points from each centroid. If a member in a cluster is found to be closer to the centroid of another cluster than to its own cluster's centroid, then redirect that vector to the cluster to which it belongs. Rearrange all vectors in such a manner, assigning each one to a cluster whose centroid is the closest. Note that the number *k* can increase or decrease in the course of this process. Continue the procedure until no more improvement is possible. At this stage, that final *k* is the number of clusters in the data and the final members in each cluster are set. This procedure is also called *k-means approach*.

This k-means approach has a serious shortcoming: if one starts with a different set of seed vectors, then it is possible to end up with a different set of final clusters. On the other hand, this method has the appreciable advantage that it allows a member provisionally assigned to a cluster to be moved to another cluster where it really belongs, that is, it allows the transfer of points. The following example should clarify the procedure.

**Example 15.2.1.** Ten volunteers are given an exercise routine in an experiment that monitors systolic pressure, diastolic pressure and heart beat. These are measured after adhering to the exercise routine for four weeks. The data entries are systolic pressure minus

120 (SP), diastolic pressure minus 80 (DP) and heart beat minus 60 (HB), where 120*,* 80 and 60 are taken as the standard readings of systolic pressure, diastolic pressure and heart beat, respectively. Carry out a cluster analysis of the data. The data matrix is the following where *(*1*), . . . , (*10*)* represent the data vectors *A*1*,...,A*<sup>10</sup> for the 10 volunteers, the first row represents SP, the second row, DP, and the third, HB:




The data matrix suggests the possibility of three clusters. Accordingly, we may begin with the vectors *A*2*, A*<sup>5</sup> and *A*<sup>8</sup> as seed vectors and take the separation width as *r* = 15 units. From *D*, we find *d*<sup>2</sup> <sup>23</sup> = 1, the smallest number, and hence *A*<sup>2</sup> and *A*<sup>3</sup> form the cluster: {*A*2*, A*3}. Note that *<sup>d</sup>*<sup>2</sup> <sup>56</sup> <sup>=</sup> 11, so that *<sup>A</sup>*<sup>6</sup> and *<sup>A</sup>*<sup>5</sup> form the cluster: {*A*5*, A*6}. Since *<sup>d</sup>*<sup>2</sup> <sup>87</sup> = 9, *A*<sup>7</sup> and *A*<sup>8</sup> form a cluster: {*A*7*, A*8}. Now, consider the centroids. Letting *C*11*, C*<sup>21</sup> and *<sup>C</sup>*<sup>31</sup> denote the centroids, *<sup>C</sup>*<sup>11</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> *(A*<sup>2</sup> <sup>+</sup> *<sup>A</sup>*3*), C*<sup>21</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> *(A*<sup>5</sup> <sup>+</sup> *<sup>A</sup>*6*)* and *<sup>C</sup>*<sup>31</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> *(A*<sup>7</sup> + *A*8*)*, that is,

$$C\_{11} = \begin{bmatrix} 1 \\ -1/2 \\ -1 \end{bmatrix}, \ C\_{21} = \begin{bmatrix} 7/2 \\ 5/2 \\ 7/2 \end{bmatrix} \text{ and } C\_{31} = \begin{bmatrix} 7 \\ 9 \\ 15/2 \end{bmatrix}.$$

*.*

Let us calculate the distances of *A*1*,...,A*<sup>10</sup> from *C*11*, C*21*, C*31:

*<sup>d</sup>*2*(C*11*, A*1*)* <sup>=</sup> <sup>13</sup> <sup>4</sup> *, d*2*(C*11*, A*2*)* <sup>=</sup> <sup>1</sup> 4 *, d*2*(C*11*, A*3*)* <sup>=</sup> <sup>1</sup> 4 *, d*2*(C*11*, A*4*)* <sup>=</sup> <sup>57</sup> 4 *, <sup>d</sup>*2*(C*11*, A*5*)* <sup>=</sup> <sup>185</sup> <sup>2</sup> *, d*2*(C*11*, A*6*)* <sup>=</sup> <sup>121</sup> <sup>4</sup> *, d*2*(C*11*, A*7*)* <sup>=</sup> <sup>645</sup> <sup>4</sup> *, d*2*(C*11*, A*8*)* <sup>=</sup> <sup>961</sup> 4 *, <sup>d</sup>*2*(C*11*, A*9*)* <sup>=</sup> <sup>633</sup> <sup>4</sup> *, d*2*(C*11*, A*10*)* <sup>=</sup> <sup>713</sup> <sup>4</sup> *, d*2*(C*21*, A*1*)* <sup>=</sup> <sup>139</sup> <sup>4</sup> *, d*2*(C*21*, A*2*)* <sup>=</sup> <sup>131</sup> 4 *, <sup>d</sup>*2*(C*21*, A*3*)* <sup>=</sup> <sup>155</sup> <sup>4</sup> *, d*2*(C*21*, A*4*)* <sup>=</sup> <sup>131</sup> <sup>4</sup> *, d*2*(C*21*, A*5*)* <sup>=</sup> <sup>11</sup> <sup>4</sup> *, d*2*(C*21*, A*6*)* <sup>=</sup> <sup>11</sup> 4 *, <sup>d</sup>*2*(C*21*, A*7*)* <sup>=</sup> <sup>195</sup> <sup>4</sup> *, d*2*(C*21*, A*8*)* <sup>=</sup> <sup>387</sup> <sup>4</sup> *, d*2*(C*21*, A*9*)* <sup>=</sup> <sup>179</sup> <sup>4</sup> *, d*2*(C*21*, A*10*)* <sup>=</sup> <sup>291</sup> 4 *, <sup>d</sup>*2*(C*31*, A*1*)* <sup>=</sup> <sup>741</sup> <sup>4</sup> *, d*2*(C*31*, A*2*)* <sup>=</sup> <sup>757</sup> <sup>4</sup> *, d*2*(C*31*, A*3*)* <sup>=</sup> <sup>833</sup> <sup>4</sup> *, d*2*(C*31*, A*4*)* <sup>=</sup> <sup>605</sup> 4 *, <sup>d</sup>*2*(C*31*, A*5*)* <sup>=</sup> <sup>285</sup> <sup>4</sup> *, d*2*(C*31*, A*6*)* <sup>=</sup> <sup>301</sup> <sup>4</sup> *, d*2*(C*31*, A*7*)* <sup>=</sup> <sup>9</sup> 4 *, d*2*(C*31*, A*8*)* <sup>=</sup> <sup>9</sup> 4 *, <sup>d</sup>*2*(C*31*, A*9*)* <sup>=</sup> <sup>61</sup> <sup>4</sup> *, d*2*(C*31*, A*10*)* <sup>=</sup> <sup>89</sup> 4 *.*

We include all the points located within 15 units of distance to the nearest cluster. Then, the second set of clusters are the following: Cluster 1: {*A*1*, A*2*, A*3*, A*4}, cluster 2: {*A*5*, A*6}, cluster 3: {*A*7*, A*8}. Note that *A*<sup>9</sup> is quite close to Cluster 3. We may either include it in Cluster 3 or treat it as a singleton. Since the next stage calculations do not change the composition of the clusters, we may take the final clusters as {*A*1*, A*2*, A*3*, A*4}, {*A*5*, A*6}, {*A*7*, A*8*, A*9} and {*A*10} where Cluster 4 consists of a single element. This completes the computations.

Let us examine the principal components of the sample sum of products matrix and plot the points to see whether any cluster can be detected. The sample matrix denoted by **X** and the sample average, denoted by *X*¯ , are the following:

$$\mathbf{X} = \begin{bmatrix} 0 & 1 & 1 & 2 & 3 & 4 & 6 & 8 & 5 & 10 \\ 1 & 0 & -1 & 3 & 2 & 3 & 8 & 10 & 6 & 8 \\ -1 & -1 & -1 & -2 & 5 & 2 & 7 & 8 & 9 & 4 \end{bmatrix}, \ \bar{X} = \begin{bmatrix} 4 \\ 4 \\ 3 \end{bmatrix}.$$

Let the matrix of sample averages be **X**¯ = [*X,* ¯ *X,... ,* ¯ *X*¯ ] and the deviation matrix be **X***<sup>d</sup>* = **X** − **X**¯ . Then,

$$\mathbf{X}\_d = \begin{bmatrix} -4 & -3 & -3 & -2 & -1 & 0 & 2 & 4 & 1 & 6 \\ -3 & -4 & -5 & -1 & -2 & -1 & 4 & 6 & 2 & 4 \\ -4 & -4 & -4 & -5 & 2 & -1 & 4 & 5 & 6 & 1 \end{bmatrix},$$

and the sample sum of products matrix is *S* = **X***d***X**- *<sup>d</sup>* , that is,

$$S = \begin{bmatrix} 96 & 101 & 88 \\ 101 & 128 & 112 \\ 88 & 112 & 156 \end{bmatrix}.$$

The eigenvalues of *S* are *λ*<sup>1</sup> = 330*.*440*, λ*<sup>2</sup> = 40*.*522 and *λ*<sup>3</sup> = 9*.*039. An eigenvector corresponding to *λ*<sup>1</sup> = 330*.*440 and an eigenvector corresponding to *λ*<sup>2</sup> = 40*.*522, respectively denoted by *U*<sup>1</sup> and *U*<sup>2</sup> are the following:

$$U\_1 = \begin{bmatrix} 0.782 \\ 0.943 \\ 1.000 \end{bmatrix} \text{ and } U\_2 = \begin{bmatrix} -0.676 \\ -0.500 \\ -1.000 \end{bmatrix}.$$

Then the first two principal components are *U*- <sup>1</sup>*Y* and *U*- <sup>2</sup>*Y* with *Y* - = [*y*1*, y*2*, y*3]. We substitute our sample points *A*1*,...,A*<sup>10</sup> to obtain 10 pairs of numbers. For example,

$$U\_1'A\_1 = [0.782, 0.943, 1] \begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix} = -0.057, \ U\_2'A\_1 = [-0.676, -0.5, 1] \begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix} = -1.5$$

and hence the first pair of numbers or the first point is *P*<sup>1</sup> : *(*−0*.*057*,* −1*.*500*)*. Similar calculations yield the remaining 9 points as: *P*<sup>2</sup> : *(*−0*.*218*,* −1*.*676*), P*<sup>3</sup> : *(*−1*.*161*,* −1*.*176*)*, *P*<sup>4</sup> : *(*2*.*393*,* −4*.*852*), P*<sup>5</sup> : *(*9*.*232*,* 1*.*972*)*, *P*<sup>6</sup> : *(*7*.*957*,* −2*.*204*), P*<sup>7</sup> : *(*19*.*236*,* −1*.*056*), P*<sup>8</sup> : *(*23*.*686*,* −2*.*408*)*, *P*<sup>9</sup> : *(*18*.*568*,* 2*.*620*), P*<sup>10</sup> : *(*19*.*364*,* −6*.*760*)*. It is seen that these points which are plotted in Fig. 15.2.2 form the same clusters as the original points shown in Fig. 15.2.1, that is, Cluster 1: {*A*1*, A*2*, A*3*, A*4}; Cluster 2: {*A*5*, A*6}; Cluster 3: {*A*7*, A*8*, A*9}; Cluster 4: {*A*10}.

Other non-hierarchical methods are currently in use. We will mention these procedures later, after discussing the main hierarchical technique known as *single linkage* or nearest neighbor method.

#### **15.3. Hierarchical Methods of Clustering**

Hierarchical procedures are of two categories. In one of them, we begin with all the *n* data points as *n* different clusters of one element each. Then, by applying certain rules, we start combining these single-member clusters into larger clusters, the process being halted when a desired number of clusters are obtained. If the process is continued, we ultimately end up with a single cluster containing all of the *n* points. In the second category, we initially consider one cluster that comprises the *n* elements. We then start splitting this cluster into two clusters by making use of some criteria. Next, one or both of these subclusters are divided again by applying the same criteria. If the process is continued, we

**Figure 15.2.1** The original 10 data points

**Figure 15.2.2** Second versus first principal component evaluated at the *Ai*'s

finally end up with *n* clusters of one element each. The process is halted when a desired number of clusters are obtained. In all these procedures, one cannot objectively determine when to stop the process or how many distinct clusters are present. We have to specify some stopping rules as a means of selecting a suitable number of clusters.

#### **15.3.1. Single linkage or nearest neighbor method**

In this single linkage procedure, we begin by assuming that there are *n* clusters consisting of one item each. We then combine these clusters by applying a minimum distance rule. At the initial stage, we have only one element in each 'cluster', but at the following steps, each cluster will potentially contain several items and hence, the rule is stated for general clusters. Consider two clusters *A* and *B* whose elements are denoted by *Xj* and *Yj* , that is, *Xj* ∈ *A* and *Yj* ∈ *B*, the *Xj* 's and *Yj* 's being *p*-vectors belonging to the data set at hand. In the minimum distance rule, we define the distance between two clusters, denoted by *d(A, B)*, as follows:

$$d(A, B) = \min\{d(X\_i, Y\_j), \text{ for all } X\_i \in A, Y\_j \in B\}.\tag{15.3.1}$$

This distance is measured in the units of the definition of the distance being utilized. We will illustrate the single linkage hierarchical procedure by making use of the data set provided in Example 15.2.1 and its associated dissimilarity matrix *D*. We will utilize the dissimilarity matrix *D* to represent various "distances". Since the matrix *D* will be repeatedly referred to at every stage, it is duplicated next for ready reference:

$$D = \begin{bmatrix} \downarrow \rightarrow (1) & (2) & (3) & (4) & (5) & (6) & (7) & (8) & (9) & (10) \\ (1) & 0 & 2 & 5 & 9 & 46 & 29 & 149 & 226 & 150 & 174 \\ (2) & 2 & 0 & 1 & 11 & 44 & 27 & 153 & 230 & 152 & 170 \\ (3) & 5 & 1 & 0 & 18 & 49 & 34 & 170 & 251 & 165 & 187 \\ (4) & 9 & 11 & 18 & 0 & 51 & 20 & 122 & 185 & 139 & 125 \\ (5) & 46 & 44 & 49 & 51 & 0 & 11 & 49 & 98 & 36 & 86 \\ (6) & 29 & 27 & 34 & 20 & 11 & 0 & 54 & 101 & 59 & 65 \\ (7) & 149 & 153 & 170 & 122 & 49 & 54 & 0 & 9 & 9 & 25 \\ (8) & 226 & 230 & 251 & 185 & 98 & 101 & 9 & 0 & 26 & 24 \\ (9) & 150 & 152 & 165 & 139 & 36 & 59 & 9 & 26 & 0 & 54 \\ (10) & 174 & 170 & 187 & 125 & 86 & 65 & 25 & 24 & 54 & 0 \end{bmatrix}$$

To start with, we have 10 clusters {*Aj* }*, j* = 1*,...,* 10. At the initial stage, each cluster has one element. Then *d(A, B)* as defined in (15.3.1) is the smallest distance (dissimilarity) appearing in *D*, that is, 1 which occurs between the elements corresponding to *A*<sup>2</sup> and *A*3. These two clusters of one vector each are combined and replaced by *B*<sup>1</sup> by taking the smaller entries in each column of the combined representation of the dissimilarity measures corresponding *A*<sup>2</sup> and *A*3. For illustration, we now list the dissimilarity measures corresponding to the original *A*<sup>2</sup> and *A*<sup>3</sup> and the new *B*<sup>1</sup> as rows:


The rows representing *A*<sup>2</sup> and *A*<sup>3</sup> are combined and replaced by *B*<sup>1</sup> as shown above. The second and third columns in *D* are combined into one column, namely, the *B*<sup>1</sup> column. The elements in *B*<sup>1</sup> are the smaller elements in each column of *A*<sup>2</sup> and *A*3. The bracketed elements in *A*<sup>2</sup> and *A*3, namely [0*,* 1] and [1*,* 0], are combined into one element [0] in *B*1, the updated dissimilarity matrix having one fewer row and one fewer column. These are the intersections of the two rows and columns. Other smaller elements in the two original columns, which make up *B*1, are displayed in parentheses. This process will be repeated at each stage. At the first stage of the procedure, we end up with 9 clusters: *C*<sup>1</sup> = {*A*2*, A*3}*,*{*Aj* }*, j* = 1*,* 4*,...,* 10, the resulting configuration of the dissimilarity matrix, denoted by *D*1, being

$$D\_{1} = \begin{bmatrix} \downarrow \to & A\_{1} & B\_{1} & A\_{4} & A\_{5} & A\_{6} & A\_{7} & A\_{8} & A\_{9} & A\_{10} \\ A\_{1} & 0 & 2 & 9 & 46 & 29 & 149 & 226 & 150 & 174 \\ B\_{1} & 2 & 0 & 11 & 44 & 27 & 153 & 230 & 152 & 170 \\ A\_{4} & 9 & 11 & 0 & 51 & 20 & 122 & 185 & 139 & 125 \\ A\_{5} & 46 & 44 & 51 & 0 & 11 & 49 & 98 & 36 & 86 \\ A\_{6} & 29 & 27 & 20 & 11 & 0 & 54 & 101 & 59 & 65 \\ A\_{7} & 149 & 153 & 122 & 49 & 54 & 0 & 9 & 9 & 25 \\ A\_{8} & 226 & 230 & 185 & 98 & 101 & 9 & 0 & 26 & 24 \\ A\_{9} & 150 & 152 & 139 & 36 & 59 & 9 & 26 & 0 & 54 \\ A\_{10} & 174 & 170 & 125 & 86 & 65 & 25 & 24 & 54 & 0 \\ \end{bmatrix}.$$

Now, the next smallest dissimilarity is 2 which occurs at *(A*1*, B*1*)*. Thus, the rows (columns) corresponding to *A*<sup>1</sup> and *B*<sup>1</sup> are combined into one row (column) *B*2. The original rows corresponding to *A*<sup>1</sup> and *B*<sup>1</sup> and the new row corresponding to *B*<sup>2</sup> are the following:

$$\begin{array}{ccccccccc} A\_1 & \text{: [0 & 2]} & \text{(9)} & 46 & 29 & \text{(149)} & \text{(226)} & \text{(150)} & \text{174} \\ B\_1 & \text{: [2 & 0]} & \text{11} & \text{(44)} & \text{(27)} & \text{153} & \text{230} & \text{152} & \text{(170)} \\ B\_2 & \text{: [0]} & \text{9} & \text{44} & \text{27} & \text{149} & \text{226} & \text{150} & \text{170} \\ \end{array}$$

The new configuration, denoted by *D*2, is the following:

$$D\_{2} = \begin{bmatrix} \downarrow \rightarrow & B\_{2} & A\_{4} & A\_{5} & A\_{6} & A\_{7} & A\_{8} & A\_{9} & A\_{10} \\ B\_{2} & 0 & 9 & 44 & 27 & 149 & 226 & 150 & 170 \\ A\_{4} & 9 & 0 & 51 & 20 & 122 & 185 & 139 & 125 \\ A\_{5} & 44 & 51 & 0 & 11 & 49 & 98 & 36 & 86 \\ A\_{6} & 27 & 20 & 11 & 0 & 54 & 101 & 59 & 65 \\ A\_{7} & 149 & 122 & 49 & 54 & 0 & 9 & 9 & 25 \\ A\_{8} & 226 & 185 & 98 & 101 & 9 & 0 & 26 & 24 \\ A\_{9} & 150 & 139 & 36 & 59 & 9 & 26 & 0 & 54 \\ A\_{10} & 170 & 125 & 86 & 65 & 25 & 24 & 54 & 0 \end{bmatrix}$$

*,*

the resulting clusters being *C*<sup>2</sup> = {*A*1*, A*2*, A*3}*,*{*Aj* }*, j* = 4*,...,* 10. The next smallest dissimilarity is 9, which occurs at *(B*2*, A*4*)*. Hence these are combined, that is, the first two columns (rows) are merged as explained. The combined row, denoted by *B*3, is the following, its transpose becoming the first column:

$$B\_3 = [0, 44, 20, 122, 185, 139, 125],$$

and the new configuration is the following:

$$D\_{3} = \begin{bmatrix} \downarrow \to & B\_{3} & A\_{5} & A\_{6} & A\_{7} & A\_{8} & A\_{9} & A\_{10} \\ B\_{3} & 0 & 44 & 20 & 122 & 185 & 139 & 125 \\ A\_{5} & 44 & 0 & 11 & 49 & 98 & 36 & 86 \\ A\_{6} & 20 & 11 & 0 & 54 & 101 & 59 & 65 \\ A\_{7} & 122 & 49 & 54 & 0 & 9 & 9 & 25 \\ A\_{8} & 185 & 98 & 101 & 9 & 0 & 26 & 24 \\ A\_{9} & 139 & 36 & 59 & 9 & 26 & 0 & 54 \\ A\_{10} & 125 & 86 & 65 & 25 & 24 & 54 & 0 \end{bmatrix} .$$

At this stage, the clusters are *C*<sup>2</sup> = {*A*1*, A*2*, A*3*, A*4}*,* {*Aj* }*, j* = 5*,...,* 10. The next smallest number is 9, which is occurring at *(A*7*, A*8*), (A*7*, A*9*)*. Accordingly, we combine *A*7*, A*<sup>8</sup> and *A*9*,* and the resulting configuration is the following where the resultant of the replacement rows (columns) is denoted by *B*4:

$$D\_4 = \begin{bmatrix} \downarrow \rightarrow \ B\_3 & A\_5 & A\_6 & B\_4 & A\_{10} \\ B\_3 & 0 & 44 & 20 & 122 & 125 \\ A\_5 & 44 & 0 & 11 & 36 & 86 \\ A\_6 & 20 & 11 & 0 & 54 & 65 \\ B\_4 & 122 & 36 & 54 & 0 & 24 \\ A\_{10} & 125 & 86 & 65 & 24 & 0 \end{bmatrix}$$

*,*

the clusters being *C*<sup>2</sup> = {*A*1*, A*2*, A*3*, A*4}*, C*<sup>3</sup> = {*A*7*, A*8*, A*9}*,* {*Ai*}*, i* = 5*,* 6*,* 10. The next smallest dissimilarity measure is 11 at *(A*5*, A*6*)*. Combining these, the replacement row is *B*<sup>5</sup> = [20*,* 0*,* 36*,* 65]*,* and the new configuration, denoted by *D*<sup>5</sup> is as follows:

$$D\_5 = \begin{bmatrix} \downarrow \rightarrow & B\_3 & B\_5 & B\_4 & A\_{10} \\ B\_3 & 0 & 20 & 122 & 125 \\ B\_5 & 20 & 0 & 36 & 65 \\ B\_4 & 122 & 36 & 0 & 24 \\ A\_{10} & 125 & 65 & 24 & 0 \end{bmatrix},$$

the resulting clusters being *C*<sup>2</sup> = {*A*1*, A*2*, A*3*, A*4}*, C*<sup>3</sup> = {*A*7*, A*8*, A*9}*, C*<sup>4</sup> = {*A*5*, A*6}*, C*<sup>5</sup> = {*A*10}*.*

We may stop at this stage since the clusters obtained from the other methods coincide with *C*2*, C*3*, C*4*, C*5. At the following step of the procedure, *C*<sup>4</sup> would combine with *C*3, with the next final stage resulting in a single cluster that would encompass all 10 points.

#### **15.3.2. Average linking as a modified distance measure**

An alternative distance measure involving all the items in pairs of clusters is considered in this subsection. As one proceeds from any stage to the next one in a hierarchical procedure, a decision is based on the next smallest distance between two clusters. At the initial stage, this does not pose any problem since the dissimilarity matrix *D* is available and each cluster contains only a single element. However, further on in the process, as there are several elements in the clusters, a more suitable definition of "distance" is required in order to proceed to the next stage. Several types of methods have been proposed in the literature. One such procedure is the *average linkage method* under which the distance between two clusters *A* and *B*, denoted again by *d(A, B)*, is defined as follows:

$$d(A, B) = \frac{1}{n\_1 n\_2} \sum\_{j=1}^{n\_2} \sum\_{i=1}^{n\_1} d(X\_i, Y\_j) \text{ for all } X\_i \in A, Y\_j \in B \tag{15.3.2}$$

where the *Xi*'s and *Yj* 's are all *p*-vectors from the given set of data points. In this case, the rule being applied is that two clusters having the smallest distance, as measured in terms of (15.3.2), are combined before initiating the next stage.

#### **15.3.3. The centroid method**

In a hierarchical single linkage procedure, another way of determining the distance between two clusters before proceeding to the next stage is referred to as the centroid method under which the Euclidean distance between the centroids of clusters *A* and *B* is defined as follows:

$$d(A,B) = d(\bar{X}, \bar{Y}) \text{ with } \bar{X} = \frac{1}{n\_1} \sum\_{j=1}^{n\_1} X\_j \text{ and } \bar{Y} = \frac{1}{n\_2} \sum\_{j=1}^{n\_2} Y\_j,\tag{15.3.3}$$

where *X*¯ is the centroid of the cluster *A* and *Y*¯ is the centroid of the cluster *B*, *Xi* ∈ *A, i* = 1*,...,n*1*, Yj* ∈ *B, j* = 1*,...,n*2. In this case, the process involves combining two clusters with the smallest *d(A, B)* as specified in (15.3.3) into a single cluster. After combining them, or equivalently, after taking the union of *A* and *B*, the centroid of the combined cluster, denoted by *Z*¯, is

$$\bar{Z} = \frac{n\_1 \bar{X} + n\_2 \bar{Y}}{n\_1 + n\_2} = \frac{1}{n\_1 + n\_2} \sum\_{j=1}^{n\_1 + n\_2} Z\_j, \; Z\_j \in A \cup B,$$

where the *Zj* 's are the original vectors that were included in *A* or *B*.

#### **15.3.4. The median method**

A main shortcoming of the centroid method of joining two clusters is that if *n*<sup>1</sup> is very large compared to *n*2, then *Z*¯ is likely to be closer of *X*¯ , and vice versa. In order to avoid this type of imbalance, a method based on the median is suggested, under which the median of the combined clusters *A* and *B* is defined as

$$\text{Median}\_{A \cup B} = \frac{1}{2}(\bar{X} + \bar{Y}) \text{ with } X\_i \in A \text{ and } Y\_j \in B,\tag{15.3.4}$$

for all *i, j* and *r*. In this process, the clusters *A* and *B* for which Median*A*∪*<sup>B</sup>* is the smallest are combined to form the next cluster whose elements are the *Zr*'s, *Zr* ∈ *A* ∪ *B*.

#### **15.3.5. The residual sum of products method**

From the one-way MANOVA layout, a residual or within group (within cluster) sum of products for clusters *A, B* and *A*∪*B*, denoted by *RA, RB* and *RA*∪*B*, are the following:

$$\begin{aligned} R\_A &= \sum\_{i=1}^{n\_1} (X\_i - \bar{X})'(X\_i - \bar{X}), \ R\_B = \sum\_{j=1}^{n\_2} (Y\_j - \bar{Y})'(Y\_j - \bar{Y}) \\\ R\_{A \cup B} &= \sum\_{r=1}^{n\_1 + n\_2} (Z\_r - \bar{Z})'(Z\_r - \bar{Z}), \ Z\_j \in A \cup B, \ \bar{Z} = \frac{n\_1 \bar{X} + n\_2 \bar{Y}}{n\_1 + n\_2}. \end{aligned}$$

Once those sums of squares have been evaluated, we compute the quantity

$$T\_{A \cup B} = R\_{A \cup B} - (R\_A + R\_B),\tag{15.3.5}$$

which can be interpreted as the increase in residual sum of products due to the process of merging the clusters *A* and *B*. Then, the procedure consists of combining those clusters *A* and *B* for which *TA*∪*<sup>B</sup>* as defined in (15.3.5) is the minimum. This method is also called *Ward's method*.

There exist other methods for combining clusters such as the *flexible beta method*, and several comparative studies point out the merits and drawbacks of the various methods.

In the hierarchical procedures considered in Sect. 15.3, we begin with the *n* data points as *n* distinct clusters of one element each. Then, by applying certain "minimum distance" methods, "distance" being defined in different ways, we combined the clusters one by one. We may also consider a hierarchical procedure wherein the *n* data points are treated as one cluster of *n* elements. At this stage, by making use of some rules, we break up this cluster into two clusters. Then, one of these or both are split again as two clusters by applying the same rule. We continue the process and stop it when it is determined that there is a

sufficient number of clusters. If the process is not halted at a certain stage, we will end up with a single cluster containing all of the *n* elements or points. We will not elaborate further on such procedures.

#### **15.3.6. Other criteria for partitioning or optimization**

In Sect. 15.2, we considered a non-hierarchical procedure known as the *k-means method*, which is the most popular in this area. After discussing this, we described the most widely utilized non-hierarchical procedure in Sect. 15.3. We will now examine other nonhierarchical procedures in common use. Some of these are connected with the MANOVA or multivariate analysis of variation of a one-way classification. In a multivariate one-way layout, let *Xij* be the *j* -th vector in the *i*-th group or *i*-th cluster, all vectors being *p*vectors or *p* × 1 real vectors. Let there be *k* groups (*k* clusters) of sizes *n*1*,...,nk* with *n*<sup>1</sup> + *n*<sup>2</sup> +···+ *nk* = *n.* = *n*, that is, the cluster sizes are *n*1*,...,nk,* respectively. Let the residual sum of products or sum of squares and cross products matrix be denoted by *U*, which is *p* × *p*. This matrix *U* is also called *within group or within cluster* variation matrix. Let the *between groups or between clusters* variation matrix be *V* . In this setup, *U* and *V* are the following:

$$U = \sum\_{i=1}^{k} \sum\_{j=1}^{m\_i} (X\_{ij} - \bar{X}\_i)(X\_{ij} - \bar{X}\_i)', \ \bar{X}\_i = \frac{1}{n\_i} \sum\_{j=1}^{m\_i} X\_{ij},\tag{15.3.6}$$

$$V = \sum\_{i \cdot j} (\bar{X}\_i - \bar{X})(\bar{X}\_i - \bar{X})' = \sum\_{i=1}^k n\_i (\bar{X}\_i - \bar{X})(\bar{X}\_i - \bar{X})', \ \bar{X} = \frac{1}{n\_\cdot} \sum\_{i \cdot j} X\_{ij} \ . \tag{15.3.7}$$

Then, under the hypothesis that the group effects or cluster effects are the same, and under the normality assumption on the *Xij* 's, *U* and *V* are independently distributed Wishart matrices with *n.* − *k* and *k* − 1 degrees of freedom, respectively, where *Σ>O* is the parameter matrix in the Wishart densities as well as the common covariance matrix of the *Xij* 's, referring to Chap. 5. Thus, *<sup>W</sup>*<sup>1</sup> <sup>=</sup> *(U* <sup>+</sup> *V )*−<sup>1</sup> <sup>2</sup>*U (U* <sup>+</sup> *V )*−<sup>1</sup> <sup>2</sup> is a real matrixvariate type-1 beta random variable with the parameters *(n.*−*<sup>k</sup>* <sup>2</sup> *, <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *)*, *<sup>W</sup>*<sup>2</sup> <sup>=</sup> *<sup>U</sup>*−<sup>1</sup> 2*V U*−<sup>1</sup> 2 is a real matrix-variate type-2 beta random variable with the parameters *( <sup>k</sup>*−<sup>1</sup> <sup>2</sup> *, n.*−*<sup>k</sup>* <sup>2</sup> *)* and *W*<sup>3</sup> = *U* + *V* follows a real Wishart distribution having *n.* − 1 degrees of freedom and parameter matrix *Σ>O*, again referring Chap. 5. Observe that both *U* and *V* are real positive definite matrices, so that all of their eigenvalues are positive. The likelihood ratio criterion *λ* for testing the hypothesis that the group effects are the same is the following:

Cluster Analysis and Correspondence Analysis

$$|\lambda^{\frac{2}{n\_\cdot}} = \frac{|U|}{|U+V|} = |W\_1| = \frac{1}{|I + U^{-\frac{1}{2}}VU^{-\frac{1}{2}}|} = \frac{1}{|I + W\_2|}.\tag{15.3.8}$$

We are aiming to have the within cluster variation small and the between cluster variation large, which means, in some sense, that *U* will be small and *V* will be large, in which case *λ* as given in (15.3.8) will be small. This also means that the trace of *U* must be small and trace of *W*<sup>2</sup> must be large. Accordingly, a few criteria for merging clusters are based on tr*(U ),* |*U*| and tr*(W*2*)*. The following are some commonly utilized criteria for combining clusters:


These criteria are applied as follows: One of the *n* observation vectors is moved to a selected cluster if tr*(U )* is a minimum (|*U*| is a minimum and tr*(W*2*)* is a maximum for the other criteria). Then, tr*(U )* is evaluated after moving the observation vectors one by one to the selected cluster and, each time, tr*(U )* is noted; the vector for which tr*(U )* attains a minimum value belongs to the selected cluster, that is, it is combined with the selected cluster. Observe that

$$\begin{split} \text{tr}(U) &= \text{tr}\Big(\sum\_{i,j} (X\_{ij} - \bar{X}\_{i})(X\_{ij} - \bar{X}\_{i})'\Big) \\ &= \text{tr}\Big(\sum\_{j=1}^{n\_{1}} (X\_{1j} - \bar{X}\_{1})(X\_{1j} - \bar{X}\_{1})'\Big) + \dots + \text{tr}\Big(\sum\_{j=1}^{n\_{k}} (X\_{kj} - \bar{X}\_{k})(X\_{kj} - \bar{X}\_{k})'\Big) \\ &= \sum\_{j=1}^{n\_{1}} (X\_{1j} - \bar{X}\_{1})'(X\_{1j} - \bar{X}\_{1}) + \dots + \sum\_{j=1}^{n\_{k}} (X\_{kj} - \bar{X}\_{k})'(X\_{kj} - \bar{X}\_{k}), \end{split} \tag{15.3.9}$$

owing to the property that, for two matrices *P* and *Q*, tr*(P Q)* = tr*(QP )* as long as *P Q* and *QP* are defined. As well, observe that since *(Xij* − *X*¯*i)*- *(Xij* − *X*¯*i)* is a scalar quantity for every *i* and *j* , it is equal to its trace. How does this criterion work in practice? Consider moving a member from the *s*-th cluster to the selected cluster, namely, the *r*-th cluster. The original centroids are *X*¯ *<sup>r</sup>* and *X*¯ *<sup>s</sup>*, and when one element is added to the *r*th cluster from the *s*-th cluster, both centroids will respectively change to, say, *X*¯ *<sup>r</sup>*+<sup>1</sup> and *X*¯ *<sup>s</sup>*−1. Compute the updated sums of squares in the new *r*-th and *s*-th clusters. Then, add up all the sums of squares in all the clusters and obtain a new tr*(U )*. Carry out this process for every member in every other cluster and compute tr*(U )* each time. Take the smallest value of tr*(U )* thus calculated, including the original value of tr*(U )*, before considering transferring any point. That vector for which tr*(U )* is minimum really belongs to the *r*-th cluster and so, is included in it. Repeat the process until no more improvement can be made, at which point no more transfer of points is necessary.

#### **Simplification of the computations of** tr*(U )*

As will be explained, computing tr*(U )* can be simplified. Consider the new sum of squares in the *r*-th cluster. Let the new and old sums of squares be denoted by (New)*r,* (New)*s,* and (Old)*r*, (Old)*s*, respectively. Let the vector transferred from the *s*-th cluster to the *r*-th cluster be denoted by *Y* . Then,

$$\begin{split} (\text{New})\_r &= \sum\_{j=1}^r (X\_{rj} - \bar{X}\_{r+1})'(X\_{rj} - \bar{X}\_{r+1}) + (Y - \bar{X}\_{r+1})'(Y - \bar{X}\_{r+1}) \\ &= \sum\_{j=1}^r (X\_{rj} - \bar{X}\_r + (\bar{X}\_r - \bar{X}\_{r+1}))'(X\_{rj} - \bar{X}\_r + (\bar{X}\_r - \bar{X}\_{r+1})) \\ &+ (Y - \bar{X}\_{r+1})'(Y - \bar{X}\_{r+1}) \\ &= \sum\_{j=1}^r (X\_{rj} - \bar{X}\_r)'(X\_{rj} - \bar{X}\_r) + r(\bar{X}\_r - \bar{X}\_{r+1})'(\bar{X}\_r - \bar{X}\_{r+1}) \\ &+ (Y - \bar{X}\_{r+1})'(Y - \bar{X}\_{r+1}) \\ &= (\text{Ord})\_r + r(\bar{X}\_r - \bar{X}\_{r+1})'(\bar{X}\_r - \bar{X}\_{r+1}) + (Y - \bar{X}\_{r+1})'(Y - \bar{X}\_{r+1}). \end{split}$$

The difference between the new sum of squares and the old one is

$$\delta\_1 = r(\bar{X}\_r - \bar{X}\_{r+1})'(\bar{X}\_r - \bar{X}\_{r+1}) + (Y - \bar{X}\_{r+1})'(Y - \bar{X}\_{r+1}).$$

Noting that

$$
\bar{X}\_r - \bar{X}\_{r+1} = \bar{X}\_r - \frac{rX\_r + Y}{r+1} = \frac{1}{r+1} [\bar{X}\_r - Y] \text{ and } Y - \bar{X}\_{r+1} = \frac{r}{r+1} [Y - \bar{X}\_r],
$$

*δ*<sup>1</sup> simplifies to

$$\delta\_{\mathbb{I}} = \frac{r}{r+1} (Y - \bar{X}\_r)'(Y - \bar{X}\_r).$$

A similar procedure can be used for the *s*-th cluster. In that case, the new sum of squares can be written as

$$\begin{aligned} (\text{New})\_s &= \sum\_{j=1}^{s-1} (X\_{sj} - \bar{X}\_{s-1})'(X\_{sj} - \bar{X}\_{s-1}) \\ &= \sum\_{j=1}^{s} (X\_{ij} - \bar{X}\_{s-1})'(X\_{sj} - \bar{X}\_{s-1}) - (Y - \bar{X}\_{s-1})'(Y - \bar{X}\_{s-1}). \end{aligned}$$

Then, proceeding as in the case of the *r*-th cluster and denoting the difference between the new and the old sums of squares as *δ*2, we have

$$\delta\_2 = -\frac{s}{s-1}(Y - \bar{X}\_s)'(Y - \bar{X}\_s), \ s > 1, \bar{s}$$

so that the sum of the differences between the new and old sums of squares, denoted by *δ*, is the following:

$$\delta = \delta\_1 + \delta\_2 = \frac{r}{r+1}(Y - \bar{X}\_r)'(Y - \bar{X}\_r) - \frac{s}{s-1}(Y - \bar{X}\_s)'(Y - \bar{X}\_s) \tag{15.3.10}$$

for *s >* 1, where *X*¯ *<sup>r</sup>* and *X*¯ *<sup>s</sup>* are the original centroids of the *r*-th and *s*-th clusters, respectively. As such, computing *δ* is very simple. Evaluate the quantity specified in (15.3.10) for all the points outside the *r*-th cluster and look for the minimum of *δ*, including the original value of *δ* = 0. If the minimum occurs at a point *Y*<sup>1</sup> outside of the *r*-th cluster, then transfer that point to the *r*-th cluster. Continue the process for every vector in the *s*-th cluster and then, for *r* = 1*,...,k*, assuming there are *k* clusters, until *δ* = 0. In the end, all the clusters are stabilized, and *k* may take on another value.

Among the three statistics tr*(U ),* |*U*| and tr*(W*2*)*, tr*(U )* is the easiest to compute, as was just explained. However, if we consider a non-singular transformation, other than an orthonormal transformation, then |*U*| and tr*(W*2*)* are invariant, but tr*(U )* is not.

We have discussed one hierarchical methodology of single linkage nearest neighbor method and one non-hierarchical procedure consisting of the *k*-means method. These seem to be the most widely utilized. We also mentioned other hierarchical and non-hierarchical methods without going into the details. All these procedures are not well-defined mathematical procedures. None of the procedures can uniquely determine the clusters if there are some clusters in the multivariate data at hand, and none of the methods can uniquely determine the number of clusters. The advantages and shortcomings of the various methods will not be discussed so as not to confound the reader.

#### **Exercises 15**

**15.1.** For the *<sup>p</sup>* <sup>×</sup> 1 vectors *<sup>X</sup>*1*,...,Xn,* let the dissimilarity measures be (1) *<sup>d</sup>(*1*) ij* = *<sup>n</sup> <sup>k</sup>*=<sup>1</sup> <sup>|</sup>*xik* <sup>−</sup> *xjk*|, (2) *<sup>d</sup>(*2*) ij* <sup>=</sup> *<sup>n</sup> <sup>k</sup>*=<sup>1</sup>*(xik* <sup>−</sup> *xjk)*2, *<sup>X</sup>*- *<sup>i</sup>* = [*x*1*i, x*2*i,...,xpi*]. Compute the matrices (1) *(d(*1*) ij )*; (2) *(d(*2*) ij )*, for the following vectors:

$$X\_1 = \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}, \ X\_2 = \begin{bmatrix} -1 \\ 1 \\ 2 \end{bmatrix}, \ X\_3 = \begin{bmatrix} 1 \\ 2 \\ -1 \end{bmatrix}, \ X\_4 = \begin{bmatrix} 2 \\ 1 \\ -1 \end{bmatrix}.$$

**15.2.** Nine test runs *T* − 1*,...,T* − 9 are done to test the breaking strengths of three alloys. The following data are the deviations from the respective expected strengths:


Carry out a cluster analysis by applying the following methods: (1) The single linkage or nearest neighbor method; (2) The average linkage method; (3) The centroid method; (4) The residual sum of products method.

**15.3.** Using the data provided in Exercise 15.2, carry out a cluster analysis by utilizing the following methods: (1) Partitioning or optimization; (2) Minimization of tr*(U )*; (3) Minimization of |*U*|; (4) Maximization of tr*(W*2*)* where *U* and *W* are given in Sect. 15.3.6.

**15.4.** Compare the results from the different methods in (1) Exercise 15.2; (2) Exercise 15.3, and make your observations.

**15.5.** Compare the results from the different methods in Exercises 15.2 and 15.3, and comment on the similarities and differences.

#### **15.4. Correspondence Analysis**

If the data at hand are classified according to two attributes, these characteristics may be of the same type, that is, both quantitative or both qualitative, or of different types, and whatever the types may be, we may construct a two-way contingency table. In a contingency table, the entries in the cells are frequencies or the number of times various combinations of the attributes appear. Correspondence Analysis is a process of identifying, quantifying, separating and plotting associations among the characteristics and relationships among the various levels. In a two-way contingency table, we identify, separate and plot associations between the two characteristics and attempt to identify relationships between row and column labels.

#### **15.4.1. Two-way contingency table**

Consider the following example. A random sample of 100 persons from a certain township are classified according to their educational level and their liberal disposition. In the frequency Table 15.4.1, the *Aj* 's represent their dispositions and the *Bj* 's, their educational levels, with *A*<sup>1</sup> ≡ tolerant, *A*<sup>2</sup> ≡ indifferent, *A*<sup>3</sup> ≡ intolerant, *B*<sup>1</sup> ≡ primary school education level, *B*<sup>2</sup> ≡ high school education level, *B*<sup>3</sup> ≡ bachelor's degree education level, *B*<sup>4</sup> ≡ master's and higher degree education level.


**Table 15.4.1:** A two-way contingency table

There are 6 persons having a tolerant disposition and primary school level of education. There is one person with an intolerant disposition and a master's degree or a higher level of education, and so on. The marginal sums are also provided in the table. For example, the total number of persons having a primary school level of education is 30, the total number of persons having an intolerant disposition is 20, and so on. The corresponding relative frequencies (a given frequency divided by 100, the total frequency) are as follows (Table 15.4.2):


**Table 15.4.2:** Relative frequencies *fij* in the two-way contingency table

The relative frequencies are denoted in parentheses by *fij* where the summation with respect to a subscript is designated by a dot, that is, *fi.* = *<sup>j</sup> fij , f.j* = *<sup>i</sup> fij* and *f..* = *i <sup>j</sup> fij* . Note that *f..* = 1. In a general notation, a two-way contingency table and the corresponding relative frequencies are displayed as follows (Table 15.4.3):


**Table 15.4.3:** A two-way contingency table and a table of relative frequencies

Letting the true probability of the occurrence of an observation in the *(i, j )*-th cell be *pij* , the following is the table of true probabilities:


**Table 15.4.4:** True probabilities *pij* in a two-way contingency table

These are multinomial probabilities and, in this case, the *nij* 's become multinomial variables. An estimate of *pij* , denoted by *p*ˆ*ij* , is *p*ˆ*ij* = *fij ,* the corresponding relative frequency. The marginal sums in Table 15.4.4 can be interpreted as follows: *p*1*.* = the probability of finding an item in the first row or the probability of an event will have the attribute *A*1; *p.j* = the probability that an event will have the characteristic *Bj* , and so on. Thus,

$$\hat{p}\_{lj} = f\_{lj} = \frac{n\_{lj}}{n}, \ \hat{p}\_{l.} = \frac{n\_{l.}}{n}, \ \hat{p}\_{.j} = \frac{n\_{.j}}{n}, \ \ i = 1, \ldots, r, \ j = 1, \ldots, s.$$

If *Ai* and *Bj* are respectively interpreted as the event that an observation will belong to the *i*-th row or the event of the occurrence of the characteristic *Ai*, and the event that an observation will belong to the *j* -th column or the event of the occurrence of the attribute *Bj* , and if we let *pi.* = *P (Ai)* and *p.j* = *P (Bj )*, then *pij* = *P (Ai* ∩ *Bj )*, where *P (Ai)* is the probability of the event *Ai*, *P (Bj )* is the probability of the event *Bj* , and *(Ai* ∩ *Bj )* is the intersection or joint occurrence of the events *Ai* and *Bj* . If *Ai* and *Bj* are independent events, *P (Ai* ∩ *Bj )* = *P (Ai)P (Bj )* or *pij* = *pi.p.j ,* the product of the marginal probabilities or the marginal totals in the table of probabilities. That is,

Cluster Analysis and Correspondence Analysis

$$P(A\_i \cap B\_j) = P(A\_i)P(B\_j) \implies p\_{ij} = p\_{i.}p\_{.j}, \; \hat{p}\_{ij} = \left(\frac{n\_{.i}}{n}\right)\left(\frac{n\_{.j}}{n}\right) = \frac{n\_{i.}n\_{.j}}{n^2} \quad (15.4.1)$$

for all *i* and *j* . In a multinomial distribution, the expected frequency in the *(i, j )*-th cell is *npij* where *n* is the total frequency. Then, the expected frequency, denoted by *E*[·], the maximum likelihood estimate (MLE) of the expected frequency, denoted by *E*ˆ[·], and the MLE of the expected frequency under the hypothesis *Ho* of independence of events *Ai* and *Bj* , are the following:

$$E[n\_{ij}] = np\_{ij},\ \hat{E}[n\_{ij}] = n\hat{p}\_{ij} = n\left(\frac{n\_{ij}}{n}\right),\ n\hat{p}\_{ij}|H\_o = n\hat{p}\_{i,}\hat{p}\_{.j} = n\left(\frac{n\_{i.}}{n}\right)\left(\frac{n\_{.j}}{n}\right) = \frac{n\_{i.}n\_{.j}}{(15.4.2)}.\tag{15.4.2}$$

Now, referring to our numerical example and the first row of Table 15.4.1, the estimated expected frequencies, under *Ho* are: *<sup>E</sup>*[*n*11|*Ho*] = *<sup>n</sup>*1*.n.*<sup>1</sup> *<sup>n</sup>* <sup>=</sup> <sup>40</sup>×<sup>30</sup> <sup>100</sup> <sup>=</sup> 12, *<sup>E</sup>*[*n*12|*Ho*] = *<sup>n</sup>*1*.n.*<sup>2</sup> *<sup>n</sup>* <sup>=</sup> <sup>40</sup>×<sup>25</sup> <sup>100</sup> <sup>=</sup> 10, *<sup>E</sup>*[*n*13|*Ho*] = <sup>40</sup>×<sup>30</sup> <sup>100</sup> <sup>=</sup> 12, *<sup>E</sup>*[*n*14|*Ho*] = <sup>40</sup>×<sup>15</sup> <sup>100</sup> = 6. All the estimated expected frequencies are shown in parentheses next to the observed frequencies in Table 15.4.5:


**Table 15.4.5:** A two-way contingency table

#### **15.4.2. Some general computations**

Let *Jr* and *Js* be respectively *r* × 1 and *s* × 1 vectors of unities and *P* be the true probability matrix, that is,

$$J\_r = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix}, \ J\_s = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix}, \ P = \begin{bmatrix} p\_{11} & p\_{12} & \cdots & p\_{1s} \\ p\_{21} & p\_{22} & \cdots & p\_{2s} \\ \vdots & \vdots & \ddots & \vdots \\ p\_{r1} & p\_{r2} & \cdots & p\_{rs} \end{bmatrix}. \tag{15.4.3}$$

Letting the marginal totals be denoted by *R* and *C*- , we have

$$R = PJ\_s = \begin{bmatrix} p\_{1.} \\ p\_{2.} \\ \vdots \\ p\_{r.} \end{bmatrix}, \quad J\_r'P = [p\_{.1}, p\_{.2}, \dots, p\_{.s}] = C' \tag{15.4.4}$$

Referring to the initial numerical example, we have the following:

$$
\hat{R} = \begin{bmatrix}
\hat{p}\_{1.} \\
\hat{p}\_{2.} \\
\vdots \\
\hat{p}\_{r.}
\end{bmatrix} = \begin{bmatrix}
n\_{1.}/n \\
n\_{2.}/n \\
\vdots \\
n\_{r.}/n
\end{bmatrix} = \begin{bmatrix}
40/100 \\
40/100 \\
20/100 \end{bmatrix} = \begin{bmatrix}
0.4 \\
0.4 \\
0.2
\end{bmatrix}
$$

$$
\hat{C}' = [\hat{p}\_{.1}, \hat{p}\_{.2}, \dots, \hat{p}\_{.s}] = \left[\frac{n\_{.1}}{n}, \dots, \frac{n\_{.s}}{n}\right]
$$

$$
= [\frac{30}{100}, \frac{25}{100}, \frac{30}{100}, \frac{15}{100}] = [0.30, 0.25, 0.30, 0.15].
$$

Writing the bordered matrix *P* as

$$
\begin{bmatrix} p\_{11} & p\_{12} & \cdots & p\_{1s} & p\_{1.} \\ p\_{21} & p\_{22} & \cdots & p\_{2s} & p\_{2.} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ p\_{r1} & p\_{r2} & \cdots & p\_{rs} & p\_{r.} \\ p\_{.1} & p\_{.2} & \cdots & p\_{.s} & 1 \end{bmatrix} = \begin{bmatrix} P & R \\ C' & 1 \end{bmatrix},\tag{15.4.5}
$$

in the numerical example, these quantities are

$$
\begin{bmatrix}
\hat{P} & \hat{R} \\
\hat{C}' & 1
\end{bmatrix} = \begin{bmatrix}
0.06 & 0.14 & 0.16 & 0.04 & 0.40 \\
0.17 & 0.05 & 0.08 & 0.10 & 0.40 \\
0.07 & 0.06 & 0.06 & 0.01 & 0.20 \\
0.30 & 0.25 & 0.30 & 0.15 & 1.00
\end{bmatrix}.
$$

Let *Dr* and *Dc* be the following diagonal matrices corresponding respectively to the row and column marginal probabilities:

$$D\_r = \begin{bmatrix} p\_{1.} & 0 & \cdots & 0 \\ 0 & p\_{2.} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & p\_{r.} \end{bmatrix}, \ D\_c = \begin{bmatrix} p\_{.1} & 0 & \cdots & 0 \\ 0 & p\_{.2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & p\_{.s} \end{bmatrix} \text{ or }$$

$$D\_r = \text{diag}(p\_{1.}, p\_{2.}, \dots, p\_r), \ D\_c = \text{diag}(p\_{.1}, p\_{.2}, \dots, p\_{.s}). \tag{15.4.6}$$

In the numerical example, these quantities are

*D*ˆ*<sup>r</sup>* = diag*(*0*.*4*,* 0*.*4*,* 0*.*2*)* and *D*ˆ *<sup>c</sup>* = diag*(*0*.*30*,* 0*.*25*,* 0*.*30*,* 0*.*15*).*

Now, consider *D*−<sup>1</sup> *<sup>r</sup> P* and *P D*−<sup>1</sup> *<sup>c</sup>* :

$$D\_r^{-1}P = \begin{bmatrix} \frac{p\_{11}}{p\_1} & \frac{p\_{12}}{p\_1} & \dots & \frac{p\_{1s}}{p\_1} \\ \frac{p\_{21}}{p\_2} & \frac{p\_{22}}{p\_2} & \dots & \frac{p\_{2s}}{p\_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{p\_{r1}}{p\_r} & \frac{p\_{r2}}{p\_r} & \dots & \frac{p\_{rs}}{p\_r} \end{bmatrix} \equiv \begin{bmatrix} R'\_1 \\ R'\_2 \\ \vdots \\ R'\_r \end{bmatrix}, \ R\_j = \begin{bmatrix} \frac{p\_{j1}}{p\_j} \\ \frac{p\_{j2}}{p\_j} \\ \vdots \\ \frac{p\_{js}}{p\_j} \end{bmatrix}, \tag{15.4.7}$$
 
$$PD\_c^{-1} = \begin{bmatrix} \frac{p\_{11}}{p\_1} & \frac{p\_{12}}{p\_2} & \dots & \frac{p\_{1s}}{p\_s} \\ \frac{p\_{21}}{p\_1} & \frac{p\_{22}}{p\_2} & \dots & \frac{p\_{2s}}{p\_s} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{p\_{r1}}{p\_1} & \frac{p\_{r2}}{p\_2} & \dots & \frac{p\_{rs}}{p\_s} \end{bmatrix} \equiv [C\_1, \dots, C\_s], \ C\_j = \begin{bmatrix} \frac{p\_{1j}}{p\_j} \\ \frac{p\_{j1}}{p\_j} \\ \vdots \\ \frac{p\_{r1}}{p\_j} \end{bmatrix}. \tag{15.4.8}$$

Referring to the numerical example, we have

$$\begin{aligned} \hat{D}\_{r}^{-1}\hat{P} &= \begin{bmatrix} n\_{11}/n\_{1.} & n\_{12}/n\_{1.} & \cdots & n\_{1s}/n\_{1.} \\ n\_{21}/n\_{2.} & n\_{22}/n\_{2.} & \cdots & n\_{2s}/n\_{2.} \\ \vdots & \vdots & \ddots & \vdots \\ n\_{r1}/n\_{r.} & n\_{r2}/n\_{r.} & \cdots & n\_{rs}/n\_{r.} \end{bmatrix} = \begin{bmatrix} 6/40 & 14/40 & 16/40 & 4/40 \\ 17/40 & 5/40 & 8/40 & 10/40 \\ 7/20 & 6/20 & 6/20 & 1/20 \end{bmatrix} \\ \hat{D}\_{r}^{-1} &= \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \\ \hat{P}\hat{D}\_{c}^{-1} &= \begin{bmatrix} n\_{11}/n\_{1.} & n\_{12}/n\_{.2} & \cdots & n\_{1s}/n\_{.s} \\ n\_{21}/n\_{1.} & n\_{22}/n\_{.2} & \cdots & n\_{2s}/n\_{.s} \\ \vdots & \vdots & \ddots & \vdots \\ n\_{r1}/n\_{.r} & n\_{r2}/n\_{.2} & \cdots & n\_{rs}/n\_{.s} \end{bmatrix} = \begin{bmatrix} 6/30 & 14/25 & 16/30 & 4/15 \\ 17/30 & 5/25 & 8/30 & 10/15 \\ 7/30 & 6/25 & 6/30 & 1/15 \end{bmatrix} \\ J\_{r}^{'}\hat{D}\_{c}^{-1} &= [1, 1, 1, 1]. \end{aligned}$$

For computing the test statistics in vector/matrix notation, we need (15.4.7) and (15.4.8).

### **15.5. Various Representations of Pearson's** *χ*<sup>2</sup> **Statistic**

Now, let us consider Pearson's *χ*<sup>2</sup> statistic for testing the hypothesis that there is no association between the two characteristics of classification or the hypothesis *Ho* : *pij* = *pi.p.j* . The *χ*<sup>2</sup> statistic is the following:

#### Arak M. Mathai, Serge B. Provost, Hans J. Haubold

$$\chi^2 = \sum\_{ij} \frac{(\text{observed frequency} - \text{expected frequency})^2}{(\text{expected frequency})} = \sum\_{ij} \frac{(n\_{ij} - \frac{n\_{i\cdot} n\_{\cdot j}}{n})^2}{\frac{n\_{i\cdot} n\_{\cdot j}}{n}} \quad (15.5.1)$$

$$\hat{\rho} = \sum\_{ij} n \frac{(\frac{n\_{ij}}{n} - \frac{n\_i}{n}\frac{n\_{.j}}{n})^2}{\frac{n\_i}{n}\frac{n\_{.j}}{n}} = n \sum\_{ij} \frac{(\hat{p}\_{ij} - \hat{p}\_{i.}\hat{p}\_{.j})^2}{\hat{p}\_{i.}\hat{p}\_{.j}} \tag{15.5.2}$$

$$=\sum\_{i=1}^{r} n\,\hat{p}\_{i.}\sum\_{j=1}^{s}\left[\left(\frac{\hat{p}\_{ij}}{\hat{p}\_{i.}}-\hat{p}\_{.j}\right)^{2}/\hat{p}\_{.j}\right] \tag{15.5.3}$$

$$=\sum\_{j=1}^{s} n\,\hat{p}\_{.j}\sum\_{i=1}^{r} \left[\left(\frac{\hat{p}\_{ij}}{\hat{p}\_{.j}} - \hat{p}\_{i.}\right)^{2} / \hat{p}\_{i.}\right].\tag{15.5.4}$$

In order to simplify the notation, we shall omit placing a hat on top of the estimates of *Ri, Cj , R, C, Dc* and *Dr*. We may then express the *χ*<sup>2</sup> statistic as the following quadratic forms:

$$\chi^2 = \sum\_{i=1}^r n p\_{i.} (R\_i - C)^{\prime} D\_c^{-1} (R\_i - C) \tag{15.5.5}$$

$$=\sum\_{j=1}^{s} np\_{.j}(C\_j - R)'D\_r^{-1}(C\_j - R). \tag{15.5.6}$$

The forms given in (15.5.5) and (15.5.6) are very convenient for extending the theory to multi-way classifications.

It is well known that, under *Ho*, Pearson's *χ*<sup>2</sup> statistic is asymptotically distributed as a chisquare random variable having *(r* − 1*)(s* − 1*)* degrees of freedom as *n* → ∞. One can also express (15.4.8) as a generalized distance between the observed frequencies and the expected frequencies, which is a quadratic form involving the inverse of the true covariance matrix of the multinomial distribution of the *nij* 's. Then, on applying the multivariate version of the central limit theorem, it can be established that, as *<sup>n</sup>* → ∞, Pearson's *<sup>χ</sup>*<sup>2</sup> statistic has a *<sup>χ</sup>*<sup>2</sup> distribution with *(r* <sup>−</sup>1*)(s*−1*)* degrees of freedom. For the representation of Pearson's *χ*<sup>2</sup> goodness-of-fit statistic as a generalized distance and as a quadratic form, and for the proof of its asymptotic distribution, the reader may refer to Mathai and Haubold (2017). There exist other derivations of this result in the literature.

The quadratic forms specified in (15.5.5) and (15.5.6) can also be interpreted as comparing the generalized distance between the vectors *Ri* and *C* in (15.5.5) and between the vectors *Cj* and *R* in (15.5.6), respectively. These will also be equivalent to testing the hypothesis *Ho* : *pij* = *pi.p.j* . As well, an interpretation can be provided in terms of profile analysis: then, the test will correspond to testing the hypothesis that the weighted row profiles are similar; analogously, using (15.5.6) corresponds to testing the hypothesis that the column profiles in a two-way contingency table are similar. Now, examine the following item:

$$\begin{aligned} P - RC' &= \begin{bmatrix} p\_{11} & p\_{12} & \cdots & p\_{1s} \\ p\_{21} & p\_{22} & \cdots & p\_{2s} \\ \vdots & \vdots & \ddots & \vdots \\ p\_{r1} & p\_{r2} & \cdots & p\_{rs} \end{bmatrix} - \begin{bmatrix} p\_{1} \\ p\_{2} \\ \vdots \\ p\_{r} \end{bmatrix} \\ &= \begin{bmatrix} p\_{11} - p\_{1,P\_{1}} & p\_{12} - p\_{1,P\_{2}} & \cdots & p\_{1s} - p\_{1,P\_{s}} \\ p\_{21} - p\_{2,P\_{1}} & p\_{22} - p\_{2,P\_{2}} & \cdots & p\_{2s} - p\_{2,P\_{s}} \\ \vdots & \vdots & \ddots & \vdots \\ p\_{rs} - p\_{r,P\_{1}} & p\_{r2} - p\_{r,P\_{2}} & \cdots & p\_{rs} - p\_{r,P\_{s}} \end{bmatrix}. \end{aligned}$$

Referring to our numerical example, these quantities are the following:

$$
\begin{split}
\hat{P} - \hat{R}\hat{C}' &= \begin{bmatrix}
\frac{n\_{11}}{n} - \frac{n\_{1}n\_{.1}}{n^{2}} & \cdots & \frac{n\_{1s}}{n} - \frac{n\_{1}n\_{.s}}{n^{2}} \\
\vdots & \ddots & \vdots \\
\frac{n\_{L}}{n} - \frac{n\_{L}n\_{.s}}{n^{2}} & \cdots & \frac{n\_{Ls}}{n} - \frac{n\_{L}n\_{.s}}{n^{2}}
\end{bmatrix} = \frac{1}{100} \times \\
\begin{bmatrix}
6 - (40)(30)/100 & 14 - (40)(25)/100 & 16 - (40)(30)/100 & 4 - (40)(15)/100 \\
17 - (40)(30)/100 & 5 - (40)(25)/100 & 8 - (40)(30)/100 & 10 - (40)(15)/100 \\
7 - (20)(30)/100 & 6 - (20)(25)/100 & 6 - (20)(30)/100 & 1 - (20)(15)/100 \\
\end{bmatrix} \\
= \begin{bmatrix}
5 & -5 & -4 & 4 \\
1 & 1 & 0 & -2
\end{bmatrix}.
\end{split}
$$

#### **15.5.1. Testing the hypothesis of no association in a two-way contingency table**

The observed value of Pearson's *χ*<sup>2</sup> statistic is

$$\begin{split} \chi^2 &= \left[ \frac{(-6)^2}{12} + \frac{(5)^2}{12} + \frac{(1)^2}{6} \right] + \left[ \frac{(4)^2}{10} + \frac{(-5)^2}{10} + \frac{(1)^2}{5} \right] \\ &+ \left[ \frac{(4)^2}{12} + \frac{(-4)^2}{12} + \frac{(0)^2}{6} \right] + \left[ \frac{(-2)^2}{6} + \frac{(4)^2}{6} + \frac{(-2)^2}{3} \right] \\ &= 16.88. \end{split}$$

Given our data, *(r* <sup>−</sup> <sup>1</sup>*)(s* <sup>−</sup> <sup>1</sup>*)* <sup>=</sup> *(*2*)(*3*)* <sup>=</sup> 6, and the tabulated critical value is *<sup>χ</sup>*<sup>2</sup> 6*,*0*.*05 = 12*.*59 at the 5% significance level. Since 12*.*59 *<* 16*.*88, the hypothesis of no association between the classification attributes is rejected as per the evidence provided by the data. This *χ*<sup>2</sup> approximation may be questionable since one of the expected cell frequencies is less that 5. For a proper application of this approximation, the cell frequencies ought to be at least 5.

#### **15.6. Plot of Row and Column Profiles**

Now, *(P* − *RC*- *)D*−<sup>1</sup> *<sup>c</sup>* means that the columns of *(P* − *RC*- *)* are multiplied by 1 *p.*1 *,...,* <sup>1</sup> *p.s ,* respectively. Then, *(P* − *RC*- *)D*−<sup>1</sup> *<sup>c</sup> (P* − *RC*- *)* is a matrix of all square and cross product terms involving *pij* −*pi.p.j* for all *i* and *j* , where the *s* columns are weighted by <sup>1</sup> *p.j* , and if pre-multiplied by *<sup>D</sup>*−<sup>1</sup> *<sup>r</sup>* , the rows are weighted by <sup>1</sup> *p*1*. ,...,* <sup>1</sup> *pr.,* respectively. Looking at the diagonal elements, we note that Pearson's *χ*<sup>2</sup> statistic is nothing but

$$\chi^2 = n \operatorname{tr} [D\_r^{-1} (P - RC) D\_c^{-1} (P - RC)'] \tag{15.6.1}$$

$$= n \sum\_{ij} \frac{(p\_{ij} - p\_{i\cdot} p\_{\cdot j})^2}{p\_{i\cdot} p\_{\cdot j}} \tag{15.6.2}$$

$$=n(\lambda\_1^2 + \dots + \lambda\_k^2) \tag{15.6.3}$$

where *λ*<sup>2</sup> 1*,...,λ*<sup>2</sup> *<sup>k</sup>* are the nonzero eigenvalues of the matrix *D*−<sup>1</sup> *<sup>r</sup> (P* −*RC*- *)D*−<sup>1</sup> *<sup>c</sup> (P* −*RC*- *)*- or of the matrix *<sup>D</sup>*−<sup>1</sup> <sup>2</sup> *<sup>r</sup> (P* −*RC*- *)D*−<sup>1</sup> *<sup>c</sup> (P* −*RC*- *)*- *D*−<sup>1</sup> <sup>2</sup> *<sup>r</sup>* with *k* being the rank of *P* −*RC*- . For the numerical example, the observed value of the matrix *<sup>Y</sup>* <sup>=</sup> *(yij )* with *yij* <sup>=</sup> *pij*−*pi.p.j* <sup>√</sup>*pi.p.j* , is obtained as follows, observing that

$$
\sqrt{n}\frac{\hat{p}\_{ij} - \hat{p}\_{i\cdot}\hat{p}\_{\cdot j}}{\sqrt{\hat{p}\_{i\cdot}\hat{p}\_{\cdot j}}} = \left[n\_{ij} - \frac{n\_{i\cdot}n\_{\cdot j}}{n}\right] \Big/ \sqrt{n\_{i\cdot}n\_{\cdot j}/n}.
$$

From the representation of *P*ˆ − *R*ˆ*C*ˆ- *,* we already have the matrix *nij* − *ni.n.j /n*, that is,

$$\begin{split} & \left( \frac{(n\_{ij} - n\_i n\_{.j}/n)}{\sqrt{n\_i n\_. j/n}} \right) \\ &= \begin{bmatrix} (6-12)/\sqrt{12} & (14-10)/\sqrt{10} & (16-12)/\sqrt{12} & (4-6)/\sqrt{6} \\ (17-12)/\sqrt{12} & (5-10)/\sqrt{10} & (8-12)/\sqrt{12} & (10-6)/\sqrt{6} \\ (7-6)/\sqrt{6} & (6-5)/\sqrt{5} & (6-6)/\sqrt{6} & (1-3)/\sqrt{3} \end{bmatrix} \\ &= \begin{bmatrix} -6/\sqrt{12} & 4/\sqrt{10} & 4/\sqrt{12} & -2/\sqrt{6} \\ 5/\sqrt{12} & -5/\sqrt{10} & -4/\sqrt{12} & 4/\sqrt{6} \\ 1/\sqrt{6} & 1/\sqrt{5} & 0/\sqrt{6} & -2/\sqrt{3} \end{bmatrix}. \end{split}$$

Then,

$$nYY' = \begin{bmatrix} \frac{33}{5} & -\frac{43}{6} & \frac{17\sqrt{2}}{30} \\ -\frac{43}{6} & \frac{103}{12} & -\frac{17\sqrt{2}}{12} \\ \frac{17\sqrt{2}}{30} & -\frac{17\sqrt{2}}{12} & \frac{17}{10} \end{bmatrix}.\tag{15.6.4}$$

The representation in (15.6.1) has the advantage that

$$\text{tr}[D\_r^{-\frac{1}{2}}(P - RC')D\_c^{-1}(P - RC')'D\_r^{-\frac{1}{2}}]$$

$$= \text{tr}[YY'], \quad Y = D\_r^{-\frac{1}{2}}(P - RC')D\_c^{-\frac{1}{2}} = (\mathbf{y}\_{ij}),$$

$$\mathbf{y}\_{ij} = \frac{p\_{ij} - p\_{i\cdot}p\_{\cdot j}}{\sqrt{p\_{i\cdot}p\_{\cdot j}}}, \quad \sum\_j n\_j^2\_{ij} = \chi^2 = n \text{tr}(\hat{Y}\hat{Y}'). \tag{15.6.5}$$

Note that *Y* is *r* ×*s* and the rank of *Y* is equal to the rank of *P* −*RC*- , which is *k*, referring to (15.6.3). Thus, there are *k* nonzero eigenvalues associated with the *r* × *r* matrix *Y Y* as well as with the *s* × *s* matrix *Y* - *Y* , which are *λ*<sup>2</sup> 1*,...,λ*<sup>2</sup> *<sup>k</sup>*. Since tr*(Y Y* - *)* <sup>=</sup> *<sup>λ</sup>*<sup>2</sup> <sup>1</sup> +···+ *<sup>λ</sup>*<sup>2</sup> *k*, we can represent Pearson's *χ*<sup>2</sup> statistic as follows, substituting the estimates of *pij , pi.* and *p.j* , etc:

$$\begin{split} \frac{\chi^2}{n} &= \text{tr}(YY') = \lambda\_1^2 + \dots + \lambda\_k^2 \\ &= \sum\_{i=1}^k \hat{p}\_{i.} (\hat{\mathcal{R}}\_i - \hat{C})' \hat{D}\_c^{-1} (\hat{\mathcal{R}}\_i - \hat{C}) \end{split} \tag{15.6.6}$$

$$=\sum\_{j=1}^{s} \hat{p}\_{\cdot j} (\hat{C}\_j - \hat{R})^{\prime} \hat{D}\_r^{-1} (\hat{C}\_j - \hat{R}).\tag{15.6.7}$$

The expressions given in (15.6.6) and (15.6.7) and the sum of the *λ*<sup>2</sup> *<sup>j</sup>* 's are called the *total inertia* in a two-way contingency table. We can also define the squared distance between two rows as

$$d\_{ij(r)}^2 = (R\_i - R\_j)' D\_c^{-1} (R\_i - R\_j) \tag{15.6.8}$$

and the squared distance between two columns as

$$d\_{ij(c)}^2 = (C\_i - C\_j)' D\_r^{-1} (C\_i - C\_j). \tag{15.6.9}$$

When the distance as specified in (15.6.8) is very small, we may combine the *i*-th and *j* -th rows, if necessary. Sometimes, the cell frequencies are small and we may wish to combine the small frequencies with other cell frequencies so that the *χ*<sup>2</sup> approximation of Pearson's *χ*<sup>2</sup> statistic be more accurate. Then, one can rely on (15.6.8) and (15.6.9) to determine whether it is indicated to combine rows and columns.

For convenience, let *r* ≤ *s*. Let *U*1*,...,Ur* be the *r*×1 normalized eigenvectors of *Y Y* - and let the *r* × *k* matrix *U* = [*U*1*, U*2*,...,Uk*]*, k* ≤ *r*. Let *V*1*,...,Vs* be the normalized eigenvectors of *Y* - *Y* and let the *r* × *k* matrix *V* = [*V*1*,...,Vk*]*, k* ≤ *s*. Now, consider the singular value decomposition

$$Y = D\_r^{-\frac{1}{2}}(P - RC')D\_c^{-\frac{1}{2}} = U\Lambda V' \tag{15.6.10}$$

where *UU*- = *Ik* = *V* - *V* and *Λ* = diag*(λ*1*,...,λk)*. Then, we can write

$$P - RC' = D\_r^{\frac{1}{2}} U \Lambda V' D\_c^{\frac{1}{2}} = W \Lambda Z' \tag{15.6.11}$$

where *W* = *D* 1 2 *<sup>r</sup> U* and *Z* = *D* 1 2 *<sup>c</sup> V* . Let *Wj , j* = 1*,... , k,* denote the columns of *W* = [*W*1*, W*2*,...,Wk*] and let *Zj , j* = 1*,... , k,* denote the columns of *Z* = [*Z*1*, Z*2*,...,Zk*]. Then, we can write

$$P - RC' = \sum\_{j=1}^{k} \lambda\_j W\_j Z'\_j \tag{15.6.12}$$

where *W*- *D*−<sup>1</sup> *<sup>r</sup> W* = *U*- *U* = *Ik* = *V* - *V* = *Z*- *D*−<sup>1</sup> *<sup>c</sup> Z*. Note that *P* − *RC* is the deviation matrix under the hypothesis *Ho* : *pij* = *pi.p.j* or

*P* − *RC*- <sup>=</sup> *(pij* <sup>−</sup> *pi.p.j )* and *<sup>Y</sup>* <sup>=</sup> *(yij )* <sup>=</sup> *<sup>D</sup>*−<sup>1</sup> <sup>2</sup> *<sup>r</sup> (P* − *RC*- *)D*−<sup>1</sup> <sup>2</sup> *<sup>c</sup>* = *pij* − *pi.p.j* <sup>√</sup>*pi.p.j .*

Thus, the procedure is as follows: If *r* ≤ *s,* then compute the *r* eigenvalues of the *r* × *r* matrix *Y Y* - . If *Y* is of rank *r*, *Y Y* - *> O* (positive definite), otherwise *Y Y* is positive semi-definite. Let the nonzero eigenvalues of *Y Y* be *λ*<sup>2</sup> 1*,...,λ*<sup>2</sup> *<sup>k</sup>*, assuming that *k* is the number of nonzero eigenvalues of *Y Y* - . These will also be the nonzero eigenvalues of *Y* - *Y* . Compute the normalized eigenvectors from *Y Y* and denote those corresponding to the nonzero eigenvalues by *U* = [*U*1*,...,Uk*] where *Uj* is the *j* -th column of *U*. Letting the normalized eigenvectors obtained from *Y* - *Y* , which correspond to the same nonzero eigenvalues, be denoted by *V* = [*V*1*,...,Vk*], we have

$$Y = U\Lambda V', \ \Lambda = \text{diag}(\lambda\_1, \dots, \lambda\_k), \ YY' = U\Lambda^2 U' \ \text{and} \ \ Y'Y = V\Lambda^2 V'. \tag{15.6.13}$$

**Example 15.6.1.** Construct a singular value decomposition of the following matrix *Q*:

$$\mathcal{Q} = \begin{bmatrix} -1 & 1 & -1 & 0 \\ 1 & 1 & 0 & 2 \end{bmatrix}.$$

**Solution 15.6.1.** Let us compute *QQ* as well as *Q*- *Q* and the eigenvalues of *QQ*- . Since

$$
\mathcal{Q}\mathcal{Q}' = \begin{bmatrix} -1 & 1 & -1 & 0 \\ 1 & 1 & 0 & 2 \end{bmatrix} \begin{bmatrix} -1 & 1 \\ 1 & 1 \\ -1 & 0 \\ 0 & 2 \end{bmatrix} = \begin{bmatrix} 3 & 0 \\ 0 & 6 \end{bmatrix} \dots
$$

the eigenvalues of *QQ* are *λ*<sup>1</sup> = 3 and *λ*<sup>2</sup> = 6. Let us determine the normalized eigenvectors of *QQ*- . Consider the equation [*QQ*- − *λI* ]*X* = *O* for *λ* = 3 and 6, and let *X*- = [*x*1*, x*2] and *O*- = [0*,* 0]. Then, for *λ* = 3, we see that *x*<sup>2</sup> = 0 and for *λ* = 6, we note that *x*<sup>1</sup> = 0. Thus, the normalized solutions are

$$U\_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \text{ and } U\_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \Rightarrow \ U = [U\_1, U\_2] = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.$$

Note that −*U*<sup>1</sup> or −*U*<sup>2</sup> or −*U*1*,* −*U*<sup>2</sup> will also satisfy all the conditions, and we could take any of these forms for convenience. Now, consider the equation *(Q*- *Q* − *λI )X* = *O,* where *X*- = [*x*1*, x*2*, x*3*, x*4] and *O*- = [0*,* 0*,* 0*,* 0] for *λ* = 3*,* 6. For *λ* = 3*,* the coefficient matrix is

$$\mathcal{Q}'\mathcal{Q} - \mathcal{R}I = \begin{bmatrix} -1 & 0 & 1 & 2 \\ 0 & -1 & -1 & 2 \\ 1 & -1 & -2 & 0 \\ 2 & 2 & 0 & 1 \end{bmatrix} \rightarrow \begin{bmatrix} -1 & 0 & 1 & 2 \\ 0 & -1 & -1 & 2 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 9 \end{bmatrix}$$

by elementary transformations. Observe that *x*<sup>4</sup> = 0 so that −*x*<sup>1</sup> +*x*<sup>3</sup> = 0 and −*x*<sup>2</sup> −*x*<sup>3</sup> = 0. Thus, one solution or an eigenvector corresponding to *λ* = 3 and their normalized form are

$$
\begin{bmatrix} 1 \\ -1 \\ 1 \\ 0 \end{bmatrix} \Rightarrow V\_1 = \frac{1}{\sqrt{3}} \begin{bmatrix} 1 \\ -1 \\ 1 \\ 0 \end{bmatrix}.
$$

Now, take *λ* = 6 and consider the equation *(Q*- *Q* − 6*I )X* = *O*; the coefficient matrix and its reduced form obtained through elementary transformations are the following:

$$
\begin{bmatrix} -4 & 0 & 1 & 2 \\ 0 & -4 & -1 & 2 \\ 1 & -1 & -5 & 0 \\ 2 & 2 & 0 & -2 \end{bmatrix} \to \begin{bmatrix} 1 & -1 & -5 & 0 \\ 0 & -4 & -1 & 2 \\ 0 & 0 & -21 & 0 \\ 0 & 0 & 9 & 0 \end{bmatrix},
$$

which shows that *x*<sup>3</sup> = 0, so that *x*<sup>1</sup> − *x*<sup>2</sup> = 0 and −4*x*<sup>2</sup> + 2*x*<sup>4</sup> = 0. Hence, an eigenvector and its normalized form are

$$
\begin{bmatrix} 1 \\ 1 \\ 0 \\ 2 \end{bmatrix} \Rightarrow \ V\_2 = \frac{1}{\sqrt{6}} \begin{bmatrix} 1 \\ 1 \\ 0 \\ 2 \end{bmatrix} .
$$

Thus, *V* = [*V*1*, V*2]*.* As mentioned earlier, we could have −*V*<sup>1</sup> or −*V*<sup>2</sup> or −*V*1*,* −*V*<sup>2</sup> as the normalized eigenvectors. As per our notation,

$$A = \text{diag}(\sqrt{3}, \sqrt{6}) \text{ and } \mathcal{Q} = UAV'.$$

Let us verify this last equality. Since

$$\begin{aligned} UAV' &= \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} \sqrt{3} & 0 \\ 0 & \sqrt{6} \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & 0 \\ \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & 0 & \frac{2}{\sqrt{6}} \end{bmatrix} \\ &= \begin{bmatrix} 1 & -1 & 1 & 0 \\ 1 & 1 & 0 & 2 \end{bmatrix}, \end{aligned}$$

we should take −*V*<sup>1</sup> to obtain *Q*. Then,

$$[U\_1, U\_2] \begin{bmatrix} \sqrt{3} & 0 \\ 0 & \sqrt{6} \end{bmatrix} \begin{bmatrix} -V\_1' \\ V\_2' \end{bmatrix} = \mathcal{Q}.$$

which verifies the result and completes the computations.

Now, we shall continue with our row and column profile plots. From (15.6.4), we have

$$nYY' = \begin{bmatrix} \frac{33}{5} & -\frac{43}{6} & \frac{17\sqrt{2}}{30} \\ -\frac{43}{6} & \frac{103}{12} & -\frac{17\sqrt{2}}{12} \\ \frac{17\sqrt{2}}{30} & -\frac{17\sqrt{2}}{12} & \frac{17}{10} \end{bmatrix} \text{ and } nY'Y = \begin{bmatrix} \frac{63}{12} & -\frac{47\sqrt{30}}{60} & -\frac{11}{3} & \frac{7\sqrt{2}}{3} \\ -\frac{47\sqrt{30}}{60} & \frac{43}{10} & \frac{3\sqrt{30}}{6} & -\frac{16\sqrt{15}}{15} \\ -\frac{11}{3} & \frac{3\sqrt{30}}{5} & \frac{8}{3} & -2\sqrt{2} \\ \frac{7\sqrt{2}}{3} & -\frac{16\sqrt{15}}{15} & -2\sqrt{2} & \frac{14}{3} \end{bmatrix}.$$

The eigenvalues of *nY Y* are *λ*<sup>1</sup> = 15*.*1369*, λ*<sup>2</sup> = 1*.*7471 and *λ*<sup>3</sup> = 0 and the normalized eigenvectors from *nY Y* - , corresponding to *λ*1*, λ*<sup>2</sup> and *λ*<sup>3</sup> are *U*1*, U*<sup>2</sup> *, U*3*,* so that *U* = [*U*1*, U*2*, U*3] where

$$U = \begin{bmatrix} 4.28 & -0.49 & 1.41 \\ -4.99 & -0.22 & 1.41 \\ 1 & 1 & 1 \end{bmatrix} \text{ and } \ A = \text{diag}(\sqrt{15.1369}, \sqrt{1.7471}, 0).$$

For the same eigenvalues *λ*1*, λ*<sup>2</sup> and *λ*3, the normalized eigenvectors determined from *nY* - *Y* , which correspond to the nonzero eigenvalues, are *V*<sup>1</sup> and *V*2, with

$$V = [V\_1, V\_2] = \begin{bmatrix} 1.10 & -0.84 \\ -1.06 & -0.16 \\ -0.83 & 0.28 \\ 1 & 1 \end{bmatrix}.$$

Since *<sup>λ</sup>*<sup>3</sup> <sup>=</sup> 0, *<sup>k</sup>* <sup>=</sup> 2, and we can take the *<sup>r</sup>* <sup>×</sup>*k*, that is, 3×2 matrix *<sup>G</sup>* <sup>=</sup> *(gij )* <sup>=</sup> *<sup>D</sup>*−<sup>1</sup> <sup>2</sup> *<sup>r</sup> UΛ* to represent the row deviation profiles and the *s* × *k* = 4 × 2 matrix *H* = *(hij )* = *D*−<sup>1</sup> <sup>2</sup> *<sup>c</sup> V Λ* to represent the column deviation profiles. For our numerical example, it follows from (15.4.6) that

$$\begin{aligned} D\_r &= \text{diag}(0.4, 0.4.0.2) \Rightarrow D\_r^{-\frac{1}{2}} = \text{diag}\left(\frac{1}{0.63}, \frac{1}{0.63}, \frac{1}{0.45}\right) \\ D\_c &= \text{diag}(0.30, 0.25, 0.30, 0.15) \Rightarrow D\_c^{-\frac{1}{2}} = \text{diag}(\frac{1}{0.55}, \frac{1}{0.50}, \frac{1}{0.55}, \frac{1}{0.39}) \\ A &= \text{diag}(\sqrt{15.1369}, \sqrt{1.7471}, 0) = \text{diag}(3.89, 1.32, 0). \end{aligned}$$

We only take the first two columns of *U* and *V* since *λ*<sup>3</sup> = 0; besides, only the first two vectors are required for plotting. Let *U(*1*)* and *V(*1*)* represent the first two columns of *U* and *<sup>V</sup>* , respectively. Then, *<sup>D</sup>*−<sup>1</sup> <sup>2</sup> *<sup>r</sup> U(*1*)Λ* will be equivalent to multiplying the first and second columns by 3*.*89 and 1*.*32, respectively, and multiplying the first and second rows by <sup>1</sup> 0*.*63 and the third row by <sup>1</sup> <sup>0</sup>*.*<sup>45</sup> . Then, we have

$$U\_{\rm (l)} = \begin{bmatrix} 4.28 & -0.49 \\ -4.99 & -0.22 \\ 1 & 1 \end{bmatrix}, \; D\_r^{-\frac{1}{2}} U\_{\rm (l)} A = \begin{bmatrix} 26.42 & -1.03 \\ -30.81 & -0.46 \\ 6.17 & 2.09 \end{bmatrix} \equiv G\_2$$

where *G*<sup>2</sup> is the matrix consisting of the first two columns of *G*. Hence, the points required for plotting the row profile are: *(*26*.*42*,* −1*.*03*)*, *(*−30*.*81*,* −0*.*46*)*, *(*6*.*17*,* 2*.*09*)*. These points being far apart, no two rows should be combined. Now, consider the column profiles: the effect of *<sup>D</sup>*−<sup>1</sup> <sup>2</sup> *<sup>c</sup> V(*1*)Λ* is to multiply the columns of *V(*1*)* by 3*.*89 and 1*.*32, respectively, and to multiply the rows by <sup>1</sup> <sup>0</sup>*.*<sup>55</sup> , <sup>1</sup> <sup>0</sup>*.*<sup>50</sup> , <sup>1</sup> <sup>0</sup>*.*<sup>55</sup> , <sup>1</sup> <sup>0</sup>*.*<sup>39</sup> , respectively. Thus,

$$V\_{(1)} = \begin{bmatrix} 1.10 & -0.84 \\ -1.06 & -0.16 \\ -0.83 & 0.28 \\ 1 & 1 \end{bmatrix}, \ D\_c^{-\frac{1}{2}} V\_{(1)} A = \begin{bmatrix} 7.78 & -2.02 \\ -8.25 & -0.42 \\ -5.87 & -0.67 \\ 9.97 & 3.38 \end{bmatrix} \equiv H\_2$$

where *H*<sup>2</sup> is the matrix consisting of the first two columns of *H*. The row profile and the column profile points are plotted in Fig. 15.6.1 where *r* next to a point indicates a row point and *c* designates a column point. That is, *i r* indicates the *i*-th row point and *j c*, the *j* -th column point. It can be seen from this plot that the row points are far apart while the second and third column points are somewhat close; accordingly, if necessary, the second and third columns could be combined.

**Figure 15.6.1** Row profile and column profile points

#### **15.7. Correspondence Analysis in a Multi-way Contingency Table**

When the data is classified under a number of variables, each variable having a number of categories, the resulting frequency table is referred to as a multi-way classification. Correspondence analysis for a multi-way classification involves converting data in a multiway classification setting into a two-way classification framework and then, employing the techniques developed in Sects. 15.5 and 15.6. The first step in this regard consists of creating an indicator matrix *C*. In order to illustrate the steps, we will first present an example. Suppose that 10 persons selected at random from a community, are classified according to three variables. Variable 1 is gender. Under this variable, we shall consider the categories male and female. Variable 2 is weight. Under this variable, we are considering three categories: underweight, normal and overweight. The third variable is education which is assumed to have four levels: level 1, level 2, level 3 and level 4. Thus, there are three variables and 9 categories. The actual data are provided in Table 15.7.1.


**Table 15.7.1:** Ten persons classified under three variables

**Table 15.7.2:** Entries of the indicator matrix of the data included in Table 15.7.1


*.*

Next, we construct the indicator matrix *C*—distinct from *C* as defined in (15.4.4)—of the data displayed in Table 15.7.1. If an item is present, we write 1 in the corresponding location in Table 15.7.2, and if it is absent, we write 0, thus populating this table where M ≡ Male, F ≡ Female, U ≡ underweight, N ≡ Normal, O ≡ overweight, L1 ≡ Level 1, L2 ≡ Level 2, L3 ≡ Level 3 and L4 ≡ Level 4. The resulting indicator matrix *C* is

$$C = \begin{bmatrix} 0 & 1 & 0 & 0 & 1 & 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 1\\ 1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 1 & 0 & 0 & 0 & 1 & 0\\ 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 1\\ 1 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0\\ 0 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 0 \end{bmatrix}$$

Note that since a person will belong to a single category of every variable, the row sum of every row will always be equal to the number of variables, which is 3 in the example. The sum of all the column entries under each variable is the number of items classified (10 in the example). We now convert the data into a two-way classification, which is achieved by converting *C* into a *Burt matrix B*, where *B* = *C*- *C*. In our example,

$$B = C'C = \begin{bmatrix} 4 & 0 & 1 & 2 & 1 & 2 & 1 & 1 & 0 \\ 0 & 6 & 1 & 2 & 3 & 1 & 1 & 2 & 2 \\ 1 & 1 & 2 & 0 & 0 & 1 & 0 & 0 & 1 \\ 2 & 2 & 0 & 4 & 0 & 0 & 1 & 2 & 1 \\ 1 & 3 & 0 & 0 & 4 & 2 & 1 & 1 & 0 \\ 2 & 1 & 1 & 0 & 2 & 3 & 0 & 0 & 0 \\ 1 & 1 & 0 & 1 & 1 & 0 & 2 & 0 & 0 \\ 1 & 2 & 0 & 2 & 1 & 0 & 0 & 3 & 0 \\ 0 & 2 & 1 & 1 & 0 & 0 & 0 & 0 & 2 \end{bmatrix}$$

Observe that the diagonal blocks in *C*- *C* correspond to the variables, gender, weight and educational level or gender versus gender, weight versus weight, educational level versus educational level. These blocks are the following:

$$
\begin{bmatrix} 4 & 0 \\ 0 & 6 \end{bmatrix}, \quad \begin{bmatrix} 2 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & 4 \end{bmatrix}, \quad \begin{bmatrix} 3 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 2 \end{bmatrix}.
$$

Various two-way contingency tables, namely gender versus weight, gender versus educational level, weight versus educational level, are combined into one two-way table displaying category versus category. The observed Pearson's *χ*<sup>2</sup> statistic from *C*- *C* is seen to be 79*.*85. In this case, the number of degrees of freedom is 8 × 8 = 64 and at 5% level, the tabulated *χ*<sup>2</sup> <sup>64</sup>*,*0*.*<sup>05</sup> ≈ 84 *>* 79*.*85; hence, the hypothesis of no association in *C*- *C* is not rejected. Note that this *χ*<sup>2</sup> approximation is unreliable since the expected frequencies are small. The most relevant parts in the Burt matrix *C*- *C* are the non-diagonal blocks of frequencies. The two non-diagonal blocks of the first two rows represent the two-way contingency tables for gender versus weight and gender versus educational level. Similarly, the non-diagonal block in the third to fifth rows represent the two-way contingency table for weight versus educational level. These are the following, denoted by *A*1*, A*2*, A*<sup>3</sup> respectively, where *A*<sup>1</sup> is the two-way contingency table of gender versus weight, *A*<sup>2</sup> is the contingency table of gender versus educational level and *A*<sup>3</sup> is the table of weight versus educational level:

$$A\_1 = \begin{bmatrix} 1 & 2 & 1 \\ 1 & 2 & 3 \end{bmatrix}, \ A\_2 = \begin{bmatrix} 2 & 1 & 1 & 0 \\ 1 & 1 & 2 & 2 \end{bmatrix}, \ A\_3 = \begin{bmatrix} 1 & 0 & 0 & 1 \\ 0 & 1 & 2 & 1 \\ 2 & 1 & 1 & 0 \end{bmatrix}.$$

The corresponding matrices of expected frequencies, under the hypothesis of no association between the characteristics of classification, denoted by *E(Ai), i* = 1*,* 2*,* 3 are

$$\begin{aligned} E(A\_1) &= \begin{bmatrix} 0.8 & 1.6 & 1.6 \\ 1.2 & 2.4 & 2.4 \end{bmatrix}, & E(A\_2) &= \begin{bmatrix} 1.2 & 0.8 & 1.2 & 0.8 \\ 1.8 & 1.2 & 1.8 & 1.2 \end{bmatrix}, \\ E(A\_3) &= \begin{bmatrix} 0.6 & 0.4 & 0.6 & 0.4 \\ 1.2 & 0.8 & 1.2 & 0.8 \\ 1.2 & 0.8 & 1.2 & 0.8 \end{bmatrix}. \end{aligned}$$

The observed values of Pearson's *χ*<sup>2</sup> statistic under the hypothesis of no association in the contingency table, and the corresponding tabulaled *χ*<sup>2</sup> critical values at the 5% significance level, are the following: *<sup>A</sup>*<sup>1</sup> : *<sup>χ</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*63*, χ*<sup>2</sup> <sup>2</sup>*,*0*.*<sup>05</sup> <sup>=</sup> <sup>5</sup>*.*<sup>99</sup> *<sup>&</sup>gt;* <sup>0</sup>*.*63; *<sup>A</sup>*<sup>2</sup> : *<sup>χ</sup>*<sup>2</sup> <sup>=</sup> 2*.*36*, χ*<sup>2</sup> <sup>3</sup>*,*0*.*<sup>05</sup> <sup>=</sup> <sup>7</sup>*.*<sup>81</sup> *<sup>&</sup>gt;* <sup>2</sup>*.*36; *<sup>A</sup>*<sup>3</sup> : *<sup>χ</sup>*<sup>2</sup> <sup>=</sup> <sup>5</sup>*.*42*, χ*<sup>2</sup> <sup>6</sup>*,*0*.*<sup>05</sup> = 12*.*59 *>* 5*.*42; hence the hypothesis would not be rejected in any of the contingency table if Pearson's statistic were applicable. Actually, the *χ*<sup>2</sup> approximation is not appropriate in any of these cases since the expected frequencies are quite small. Hence, decisions cannot be made on the basis of Pearson's statistic in these instances.

Observe that the first column of the matrix *C* corresponds to the count on "Male", the second to the count on "Female", the third to "Underweight", the fourth to "Normal", the fifth to "Overweight", the sixth to "Level 1", the seventh to "Level 2", the eighth to "Level 3" and the ninth to "Level 4". Thus, the columns represent the various characteristics or the various variables and their categories. So, if we were to plot one column as one point in the two-dimensional space, then by looking at the points we could determine which points are close to each other. For example, if the "Overweight" column point is close to the "Male" column point, then there is possibility of association between "Overweight" and "Male". Thus, our aim will be to plot each column of *C* or each column of *C*- *C* as a point in two dimensions. For this purpose, we may make use of the plotting technique described in Sects. 15.5 and 15.6. Consider a singular value decomposition of *C* = *UΛV* - *, U*- *U* = *Ik, V* - *V* = *Ik*. If *C* is *r* × *s, s < r,* then *U* is *r* × *k* and *V* is *s* × *k* where *k* is the number of nonzero eigenvalues of *CC* as well as those of *C*- *C*, and *Λ* = diag*(λ*1*,...,λk)* where *λ*2 *<sup>j</sup> , j* = 1*,... , k,* are the nonzero eigenvalues of *CC* and *C*- *C*. In the numerical example, *r* = 10 and *s* = 9. Consider the eigenvalues of *C*- *C* since in this case, the order is smaller than the order of *CC*- . Let the nonzero eigenvalues of *C*- *C* be *λ*<sup>2</sup> <sup>1</sup> ≥ ··· ≥ *<sup>λ</sup>*<sup>2</sup> *<sup>k</sup>*. From *C*- *C,* compute the normalized eigenvectors corresponding to these nonzero eigenvalues. This *s* ×*k* matrix of normalized eigenvectors is *V* in the singular value decomposition. By using the same nonzero eigenvalues, compute the normalized eigenvectors from *CC*- . This *r* × *k* matrix is *U* in the singular value decomposition. Since the columns of *C* and *C*- *C* represent the various variables and their subdivisions, only the columns are useful for our geometrical representation, that is, only *V* will be relevant for plotting the points. Consider *<sup>H</sup>* <sup>=</sup> *V Λ* and let *<sup>λ</sup>*<sup>2</sup> <sup>1</sup> <sup>≥</sup> *<sup>λ</sup>*<sup>2</sup> <sup>2</sup> ≥···≥ *<sup>λ</sup>*<sup>2</sup> *<sup>k</sup>*. Observe that *C* = *UΛV* - ⇒ *C*- = *V ΛU*- = *H U*- . The rows of *C* represent the various variables and their categories. Let *h*1*,...,hs* be the rows of *H*. Then, we have

$$\begin{aligned} h\_1 U' &= \text{Men-row} \\ h\_2 U' &= \text{Women-row} \\ &\vdots \\ h\_s U' &= \text{Level 4-row}. \end{aligned}$$

This shows that the rows *h*1*,...,hs* represent the various variables and their categories. Since the first two eigenvalues are the largest ones and *V*1*, V*<sup>2</sup> are the corresponding eigenvectors, we can take it for granted that most of the information about the various variables and their categories is contained in the first two elements in *h*1*,...,hs* or in the first two columns weighted by *λ*<sup>1</sup> and *λ*2. Accordingly, take the first two columns from *H* and denote this submatrix by *H(*2*)* where

$$H\_{(2)} = \begin{bmatrix} h\_{11} & h\_{12} \\ h\_{21} & h\_{22} \\ \vdots & \vdots \\ h\_{s1} & h\_{s2} \end{bmatrix}.$$

Plot the points *(h*11*, h*12*), (h*21*, h*22*), . . . , (hs*1*, hs*2*)*. These *s* points correspond to the *s* columns in the *r* × *s* matrix *C* or the *s* rows in *C*- .

Referring to our numerical example, the eigenvalues are

$$\begin{aligned} \lambda\_1^2 &= 11.66, \; \lambda\_2^2 = 5.57, \; \lambda\_3^2 = 5.28, \; \lambda\_4^2 = 3.47, \; \lambda\_5^2 = 2.34, \\\ \lambda\_6^2 &= 1, 14, \; \lambda\_7^2 = 0.54, \; \lambda\_8 = \lambda\_9 = 0, \end{aligned}$$

so that *<sup>k</sup>* <sup>=</sup> 7 and the nonzero eigenvalues, *λ*2 *<sup>j</sup> , j* = 1*,...,* 7, are

$$\lambda\_1 = 3.41, \,\lambda\_2 = 2.36, \,\lambda\_3 = 2.30, \,\lambda\_4 = 1.86, \,\lambda\_5 = 1.53, \,\lambda\_6 = 1.07, \,\lambda\_7 = 0.73.$$

Thus, the matrix *Λ* is

$$A = \text{diag}(3.41, 2.36, 2.30, 1.86, 1.53, 1.07, 0.73),$$

and the

$$\text{total inertia} = 11.66 + 5.57 + 5.28 + 3.47 + 2.34 + 1.14 + 0.54 = \text{30} = \text{tr}(C'C).$$

$$\text{Noting that } \frac{11.66}{\text{30}} = 0.39 \text{ and } \frac{(11.66 + 5.57 + 5.28)}{\text{30}} = 0.75, \text{ we can assert that } 75\% \text{ of the inertia}$$

is accounted for by the first three eigenvalues of *C*- *C*.

The normalized eigenvectors of *C*- *C*, which correspond to the nonzero eigenvalues and are denoted by *V* = [*V*1*,...,V*7], are the following:

*V*<sup>1</sup> = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0*.*293826 0*.*615045 0*.*138401 0*.*36352 0*.*406951 0*.*248836 0*.*173839 0*.*306906 0*.*179291 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ *, V*<sup>2</sup> = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ −0*.*711194 0*.*512432 −0*.*0995834 −0*.*106977 0*.*00779839 −0*.*386116 −0*.*0833586 0*.*041766 0*.*228947 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ *, V*<sup>3</sup> = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0*.*123362 −0*.*0941084 −0*.*0879546 0*.*638327 −0*.*521119 −0*.*428293 0*.*0446188 0*.*302598 0*.*110329 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ *, V*<sup>4</sup> = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0*.*0732966 0*.*107206 0*.*601988 −0*.*03252 −0*.*388966 0*.*167134 −0*.*164402 −0*.*357001 0*.*534772 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ *, V*<sup>5</sup> = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0*.*0406003 0*.*0238456 −0*.*124608 0*.*110633 0*.*0784214 −0*.*206691 0*.*754879 −0*.*584142 0*.*1004 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ *, V*<sup>6</sup> = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ −0*.*115587 −0*.*0128351 −0*.*572877 0*.*408319 0*.*0361355 0*.*400243 −0*.*367304 −0*.*382451 0*.*22109 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ *, V*<sup>7</sup> = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 0*.*320998 −0*.*262335 −0*.*162227 −0*.*195339 0*.*416228 −0*.*427087 −0*.*191702 0*.*0724589 0*.*604993 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ *.*

*.*

Then, the first two eigenvectors weighted by *λ*<sup>1</sup> and *λ*<sup>2</sup> and the points to be plotted are

$$\begin{aligned} \lambda\_1 V\_1 &= 3.41472 \\ \lambda\_1 &= 3.1472 \\ & 0.13840 \\ & 0.460951 \\ & 0.460951 \\ & 0.17383 \\ & 0.17383 \\ \lambda\_2 V\_2 &= 2.36098 \\ & \lambda\_3 V\_3 &= \begin{bmatrix} -0.71194 \\ 0.51243 \\ 0.51293 \\ 0.17929 \\ 0.017983 \\ 0.017983 \\ 0.007983 \\ 0.007983 \\ 0.007983 \\ 0.00416 \\ 0.04176 \\ 0.04176 \\ 0.04176 \\ 0.22847 \end{bmatrix} &= \begin{bmatrix} -1.6791 \\ 0.51243 \\ -0.99984 \\ -0.25114 \\ -0.25174 \\ 0.08116 \\ 0.04176 \\ 0.04508 \\ 0.25984 \\ 0.54083 \\ \end{bmatrix} \end{aligned}$$

$$\text{Points to be plotted}: \begin{bmatrix} (1.00333, -1.6791) \\ (1.00333, -1.6791) \\ (2.1202, -1.2084) \\ (0.4726, -0.25114) \\ (0.2436, -0.2572) \\ (1.2436, -0.2572) \\ (0.3897, 0.16118) \\ (0.8497, 0.16181) \\ \end{bmatrix} \text{or} \begin{bmatrix} \text{Mean} \\ \text{Mean} \\ \text{Number} \\ \text{Number} \\ \text{Number} \\ \text{Number} \\ \text{Number} \\ \text{Number} \end{bmatrix}$$

The plot of these points is displayed in Fig. 15.7.1.

It is seen from the points plotted in Fig. 15.7.1 that the categories underweight and educational level 2 are somewhat close to each other, which is indicative of a possible association, whereas the categories underweight and women are the farthest apart.

Cluster Analysis and Correspondence Analysis

**Figure 15.7.1** Multiple contingency plot

#### **Exercises 15 (continued)**

**15.6.** In the following two-way contingency table, where the entries in the cells are frequencies, (1) calculate Pearson's *χ*<sup>2</sup> statistic and give the representations in (15.5.1)– (15.5.6); (2) plot the row profiles; (3) plot the column profiles:

$$
\begin{array}{ccccc}
\downarrow \to B\_1 & B\_2 & B\_3 & B\_4 \\
A\_1 & 10 & 15 & 20 & 15 \\
A\_2 & 15 & 10 & 10 & 5
\end{array}
$$

**15.7.** Repeat Exercise 15.6 for the following two-way contingency table:

$$
\begin{array}{ccccc}
\downarrow \to B\_1 & B\_2 & B\_3 & B\_4 \\
 A\_1 & 10 & \mathfrak{S} & 1\mathfrak{S} & \mathfrak{S} \\
 A\_2 & \mathfrak{S} & 10 & 10 & 20 \\
 A\_3 & 10 & \mathfrak{S} & 10 & \mathfrak{S} \\
 A\_4 & 1\mathfrak{S} & 10 & \mathfrak{S} & 10 \\
\end{array}
$$

**15.8.** For the data in (1) Exercise 15.6, (2) Exercise 15.7, and by using the notations defined in Sects. 15.5 and 15.6, compute the following items: Estimates of (i) *A* = *D*−<sup>1</sup> <sup>2</sup> *<sup>r</sup> (P* −*RC*- *)D*−<sup>1</sup> *<sup>c</sup> (P* −*RC*- *)*- *D*−<sup>1</sup> <sup>2</sup> *<sup>r</sup>* ; (ii) Eigenvalues of *A* and tr*(A)*; (iii) Total inertia and proportions of inertia accounted for by the eigenvalues; (iv) The matrix of row-profiles; (v) The matrix of column-profiles, and make comments.

**15.9.** Referring to Exercises 15.6 and 15.7, plot the row profiles and column profiles and make comments.

**15.10.** In a used car lot, there are high price, average price and low price cars, the cars come in the following colors: red, white, blue and silver, and the paint finish is either mat or shiny. Fourteen customers bought vehicles from this car lot. Their preferences are given next. (1) Carry out a multiple correspondence analysis, plot the column profiles and make comments; (2) Create individual two-way contingency tables, analyze these tables and make comments. The following is the data where the first column indicates the customer's serial number:


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Tables of Percentage Points**

Tables 1, 2, 3, 4, 5, 6 and 7 contain probabilities and percentage points that are useful for testing a variety of statistical hypotheses encountered in multivariate analysis.


**Table 1:** Standard normal probabilities. Table entry: probability <sup>=</sup> # *<sup>x</sup>* 0 <sup>e</sup>−*t*2*/*<sup>2</sup> <sup>√</sup>2*<sup>π</sup>* <sup>d</sup>*<sup>t</sup>*


**Table 1:** (continued): Standard normal probabilities


**Table 2:** Student-*t*, right tail. Table entry: *tν,α* where # <sup>∞</sup> *tν,α f (tν )*d*tν* = *α* and *f (tν )* is the density of a Student-*t* distribution having *ν* degrees of freedom


**Table 3:** Chisquare, right tail. Table entry: *χ*<sup>2</sup> *ν,α* where # <sup>∞</sup> *χ*2 *ν,α f (χ*<sup>2</sup> *<sup>ν</sup> )*d*χ*<sup>2</sup> *<sup>ν</sup>* = *α* with *f (χ*<sup>2</sup> *<sup>ν</sup> )* being the density of a chisquare random variable having *ν* degrees of freedom

#### Tables of Percentage Points


**Table 3:** (continued): Chisquare, right tail

**Table 4:** *F*-distribution, right tail 5% points. Table entry: *b* = *Fν*1*,ν*2*,*0*.*<sup>05</sup> where # <sup>∞</sup> *<sup>b</sup> f (Fν*1*,ν*<sup>2</sup> *)*d*Fν*1*,ν*<sup>2</sup> = 0*.*05 with *f (Fν*1*,ν*<sup>2</sup> *)* being the density of an *F*-variable having *ν*<sup>1</sup> and *ν*<sup>2</sup> degrees of freedom


#### Tables of Percentage Points


**Table 4:** (continued): *F*-distribution, right tail 5% points

**Table 5:** *F*-distribution, right tail 1% points. Table entry: *b* = *Fν*1*,ν*2*,*0*.*<sup>01</sup> where # <sup>∞</sup> *<sup>b</sup> f (Fν*1*,ν*<sup>2</sup> *)* d*Fν*1*,ν*<sup>2</sup> = 0*.*01 with *f (Fν*1*,ν*<sup>2</sup> *)* being the density of an *F*-variable having *ν*<sup>1</sup> and *ν*<sup>2</sup> degrees of freedom


#### Tables of Percentage Points


**Table 5:** (continued): *F*-distribution, right tail 1% points

**Table 6:** Testing independence. Let *Ho* : *Σ* is diagonal in a *Np(μ, Σ)* population and *u* = *λ*<sup>2</sup>*/n* where *n* is the sample size and *λ* refers to the *λ*-criterion. The table entries are the 5th and 1st upper percentage points of *w* = −[*m*+*(*2*p* +5*)/*6] ln *u* where *m* = *n*−1. *Ho* is rejected for large values of *w*. The last line wherein *m* = ∞ displays the asymptotic chisquare values as *w* → *χ*<sup>2</sup> *p(p*−1*)/*2



**Table 6:** Testing Independence

Note: For *<sup>p</sup>* <sup>=</sup> <sup>2</sup>*, w* <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>r</sup>*<sup>2</sup> where *<sup>r</sup>* is the sample correlation coefficient. Moreover, *m r*<sup>2</sup> <sup>1</sup>−*r*<sup>2</sup> <sup>∼</sup> *<sup>F</sup>*1*,m* where *<sup>m</sup>* <sup>=</sup> *<sup>n</sup>* <sup>−</sup> 1 and *<sup>n</sup>* is the sample size; also refer to Sect. 5.6 and Eq. (6.6.1). Thus, the case *p* = 2 is omitted from the tables as one can then readily obtain the requisite critical values from *F* or Student-*t* tables in light of these properties.

**Table 7:** Testing the equality of the diagonal elements, given that *Σ* in a *Np(μ, Σ)* population is diagonal. Let *v* = *λ*<sup>2</sup>*/n* where *λ* is the likelihood ratio test statistic and *n* is the sample size. Letting *f (v)* denote the density of *v*, the table entries *w* are the critical values of *v* such that *F (w)* <sup>=</sup> # *<sup>w</sup>* <sup>0</sup> *f (v)* d*v* = 0*.*01*,* 0*.*02*,* 0*.*025*,* 0*.*05, for various values of *p* and *m* = *n* − 1



**Table 7:** (continued): Testing the Equality of the Diagonal Elements, Given that *Σ* in a *Np(μ, Σ)* Population is Diagonal


**Table 7:** (continued): Testing the Equality of the Diagonal Elements, Given that *Σ* in a *Np(μ, Σ)* Population is Diagonal


**Table 7:** (continued): Testing the Equality of the Diagonal Elements, Given that *Σ* in a *Np(μ, Σ)* Population is Diagonal


**Table 7:** (continued): Testing the Equality of the Diagonal Elements, Given that *Σ* in a *Np(μ, Σ)* Population is Diagonal


**Table 7:** (continued): Testing the Equality of the Diagonal Elements, Given that *Σ* in a *Np(μ, Σ)* Population is Diagonal

*p* = *8 F (w)* = 0*.*01 *F (w)* = 0*.*02 *F (w)* = 0*.*025 *F (w)* = 0*.*05 *m w w w w* 8 0.08947 0.11391 0.12334 0.15897 9 0.11816 0.14632 0.15699 0.19653 10 0.14735 0.17850 0.19012 0.23256 11 0.17632 0.20980 0.22215 0.26665 12 0.20460 0.23985 0.25274 0.29867 13 0.23192 0.26849 0.28175 0.32859 14 0.25810 0.29563 0.30913 0.35650 15 0.28309 0.32126 0.33492 0.38250 16 0.30686 0.34544 0.35916 0.40672 17 0.32942 0.36820 0.38194 0.42931 18 0.35081 0.38965 0.40335 0.45038 19 0.37108 0.40985 0.42348 0.47006 20 0.39028 0.42878 0.44241 0.48848 *p* = *9 F (w)* = 0*.*01 *F (w)* = 0*.*02 *F (w)* = 0*.*025 *F (w)* = 0*.*05 *m w w w w* 9 0.09813 0.12249 0.13178 0.16652 10 0.12474 0.15218 0.16250 0.20044 11 0.15160 0.18155 0.19267 0.23304 12 0.17821 0.21015 0.22189 0.26404 13 0.20421 0.23770 0.24990 0.29332 14 0.22939 0.26406 0.27660 0.32088 15 0.25362 0.28917 0.30195 0.34676 16 0.27685 0.31302 0.32596 0.37104 17 0.29904 0.33563 0.34865 0.39380 18 0.32020 0.35704 0.37010 0.41515 19 0.34036 0.37731 0.39036 0.43619 20 0.35955 0.39650 0.40950 0.45401

**Table 7:** (continued): Testing the Equality of the Diagonal Elements, Given that *Σ* in a *Np(μ, Σ)* Population is Diagonal


**Table 7:** (continued): Testing the Equality of the Diagonal Elements, Given that *Σ* in a *Np(μ, Σ)* Population is Diagonal

Note: For large sample size *n,* we may utilize the approximation <sup>−</sup>2 ln *<sup>λ</sup>* <sup>≈</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>p</sup>*−<sup>1</sup> for the null distribution of *<sup>λ</sup>*, where *<sup>λ</sup>* <sup>=</sup> *<sup>v</sup>n/*<sup>2</sup> is the lambda criterion, *<sup>n</sup>* is the sample size and *χ*2 *<sup>p</sup>*−<sup>1</sup> denotes a central chisquare random variable having *<sup>p</sup>* <sup>−</sup> 1 degrees of freedom. The null hypothesis will be rejected for large values of −2 ln *λ* since it is rejected for small values of the test statistics *λ* or *v*.

# **Some Additional Reading Materials**

#### **Books**


#### **Some Recent Research Papers**


# **Author Index**

### **A**

Abeida, A., 910 Adachi, K., 907 Agresti, A., 907 Akemann, G., 913 Allman, E.S., 686 Anderson, T.W., 457, 566, 685, 907 Antunes, L., 912 Arashi, D.K., 911 Atkinson, A.C., 907

# **B**

Balakrishnan, N., 510, 908 Barnes, C.A., 495 Bartholomew, D.J., 907 Besson, O., 910 Bishop, V.M., 907 Bodnar, T., 910 Brown, B.L., 907

# **C**

Castro, L., 912 Cerioli, A., 907 Chatfield, C., 907 Chattopadhyay, A.K., 627 Chen, J.T., 910 Chen, Y., 686

Chiani, M., 627, 910, 913 Christensen, W.F., 909 Clayton, D.D., 495 Clemm, D.S., 627 Collins, A.J., 907 Costa-Santos, C., 912 Cox, D.R., 907 Cox, T.F., 907 Crichfield, C., 495

# **D**

Davis, A.W., 435, 627, 910 de Micheaux, P.L., 910 Delmas, J.-P., 910 Denuit, M., 910 Diaz-Garcia, J.A., 910 Diedrichsen, J., 910 Ducharme, G., 910 Dugard, P., 907 Dunn, G., 908

### **E**

Edelman, A., 627, 910 Everitt, B., 908

# **F**

Fang, J., 911 Fidell, L.S., 909

© The Author(s) 2022 A. M. Mathai et al., *Multivariate Statistical Analysis in the Real and Complex Domains*, https://doi.org/10.1007/978-3-030-95864-0

Field, J.B., 435 Fienberg, S.E., 907 Flury, B., 908 Fowler, W.A., 495 Fuhrmann, D.R., 911 Fujikoshi, Y., 908, 912

### **G**

Gamst, G., 909 Green, P.E., 908 Guarino, A.J., 909 Guhr, T., 913 Gupta, A.K., 908, 910, 912 Gupta, S., 911 Gweon, H., 913

# **H**

Habti, A., 913 Haddad, J.N., 912 Hair, J.F., 908 Hardle, W.K., 908 ¨ Haubold, H.J., 1, 105, 108, 110, 113, 125, 139, 357, 359, 360, 367, 369, 377, 395, 469, 478, 495, 502, 510, 595, 599, 611, 647, 652, 657, 759, 763, 771, 833, 870, 911 Henriques, T., 912 Hinrichs, N.S., 911 Holland, P.W., 907 Hothorn, T., 908 Hougaard, P., 908 Huang, H., 620, 912 Huberty, C.J., 908

# **I**

Ihara, M., 686 Illic, V., 911 ´ Iranmanesh, A., 911 Izenman, A.J., 908

### **J**

Jaimez, R.G., 910 ´ James, A.T., 627 James, O., 627, 911 Janik, R.A., 911 Johnson, M.E., 908 Johnson, N.L., 908 Johnson, R.A., 908 Johnstone, I.M., 627, 911 Jorge, A., 909

# **K**

Kamrath, M.J., 912 Kano, Y., 686 Katiyar, R.S., 449, 457 Khatri, C.G., 627, 909 Kieburg, M., 913 Kitamura, D., 911 Kollo, T., 908 Kondo, T., 537, 538 Korbel, J., 911 Korin, B.P., 435 Koski, T., 912 Kotz,S., 908 Kres, H., 909 Krishnaiah, P.R., 627 Krzanowski, W.J., 909 Kubokawa, T., 911 Kwak, N., 620, 911

### **L**

Lai, C.D., 510 Lee, H.-N., 911 Lee, W.-N., 627 Li, X., 686 Lu,Y., 910

### **M**

Manly, B.F.J., 909

#### Author Index

Marchina, B., 910 Marcoulides, G.A., 909 Marshall, A.W., 510 Mathai, A.M., 1, 44, 77, 81, 105, 108, 110, 113, 125, 139, 357, 359, 360, 367, 369, 377, 395, 469, 478, 495, 502, 510, 595, 599, 611, 647, 652, 657, 759, 763, 771, 833, 870, 911 Matias, C., 686 Meyers, L., 909 Mitsui, Y., 911 Mogami, S., 911

### **N**

Nagao, H., 435 Nagar, D.K., 908, 911, 912 Nagarsenker, B.N., 435 Navarro, A., 909 Nie, F., 912 Nowak, M.A., 911

### **O**

Okhrin, Y., 910 Olejnik, S., 908 Olive, D.J., 909 Olivier, B., 910 Olkin, I., 510 Ono, N., 911 Ostashev, V., 912

### **P**

Pais, A., 495 Pande, V.S., 911 Park, K., 913 Perez-Abreu, V., 912 ´ Pettit, C.L., 912 Pillai, K.C.S., 435, 627 Potthoff, R.F., 842 Press, S.J., 909

Princy, T., 495 Provost, S.B., 81, 170, 510, 533, 910, 912, 913

### **R**

Rathie, P.N., 201, 436, 443, 449, 457 Ratnarajah, T., 912 Raykov, T., 909 Rencher, A.C., 909 Rhodes, J.A., 709 Riani, M., 907 Ribeiro, M., 912 Roldan-Correa, A., 912 ´ Roy, S.N., 842 Rubin, H., 685 Ryu, J., 913

### **S**

Saruwatari, H., 908, 911, 912 Saxena, R.K., 110, 113, 443, 452, 457, 478, 771, 833 Scarfone, A.M., 911 Schafer, J.L., 909 Schramm, D.N., 495 Scott, D.W., 909 Serdobolskii, V., 909 Sheena, Y., 912 Shi, J., 620, 912 Shimizu, R., 908 Simar, L., 908 Singull, M., 912 Souto, A., 912 Spencer, N.H., 909 Srivastava, M.S., 909, 911, 912 Staines, H., 907 Stelzer, R., 912 Sugiura, N., 435

### **T**

Tabachnick, B.G., 909 Tabatabaey, S.M.M., 911 Takamune, N., 911 Teixeira, A., 912 Timm, N.H., 909 Todman, J., 907

### **U**

Ullman, J.B., 909 Ulyanov, V.V., 908

### **V**

Vaillancourt, R., 912 von Rosen, D., 908

### **W**

Wadsack, P., 909 Waikar, V.B., 627 Wegge, L.L., 686

Wegner, R., 913 Wei, W.W.S., 909 Wermuth, N., 907 Wichern, D.W., 908 Wilson, D.K., 912, 913 Win, M.Z., 913 Wirtz, T., 913

### **Y**

Yang, W., 912 Yi, G.Y., 911 Yu, S., 913

### **Z**

Zanella, A., 913 Zaninetti, L., 913 Zareamoghaddam, H., 910 Zhang, S., 686 Zhang, X., 913 Zheng, X., 912

# **Subject Index**

### **A**

Analysis of variance (ANOVA), 763 Asymptotic distribution, 436 Asymptotic normality, 203

# **B**

Bayes' rule, 714 Bessel function, 334 Beta variable, 72

# **C**

Canonical correlation, 641 Canonical correlation matrix, 669 Cauchy density, 76 Central limit theorem, 119 Characteristic function, 212–214 Chebyshev's inequality, 115 Chisquare, 62, 65 Chisquaredness, 78, 172 Classification, 711–757 Cofactor, 21 Confidence interval, 125 Consistency, 198 Correlation coefficient, 355 Cramer-Rao inequality, 201 ´ Critical region, 396

# **D**

Design of experiment model, 681 Determinant, 12 Determinant, absolute value of, 43 Differential, 40 Dirichlet density, 378, 381 Discriminant function, 727 Duplication formula, 527

# **E**

Eigenvalue, eigenvector, 32 Elliptically contoured distribution, 206 Estimation, point, 120

# **F**

Factor analysis, 679 Factor analysis model, 682 Factor loadings, 687 Factor space, 683 Fractional integral, first kind, 371 Fractional integral, second kind, 368 F-variable, 69

# **G**

Gamma function, multiplication formula of, 438 Gaussian, multivariate, 129 Generalized variance, 347

© The Author(s) 2022 A. M. Mathai et al., *Multivariate Statistical Analysis in the Real and Complex Domains*, https://doi.org/10.1007/978-3-030-95864-0

Geometrical probability, 355 G-function, 112, 329 Growth curve, 824

# **H**

Hazard function, 72 Hermitian forms, independence of, 175 H-function, 112 Hotelling's *T* 2, 415

### **I**

Identification problem, 685 Interval estimation, 124 Invariance, 645 Inverse Wishart density, 349 Iterative procedure, 656–663

# **J**

Jacobian, 43

**K** Kratzel function, 334 ¨

### **L**

*L1, L2* norms, 618–620 Lambda criterion, 411 Likelihood function, 396 Linear hypothesis, 463–473

### **M**

Matrix, 2 in the complex domain, 35 diagonal, identity, 4 elementary, 24 Hermitian, skew Hermitian, 35 idempotent, nilpotent, 34 inverse of, 23 multiplication, 5 null, square, rectangular, 4 partitioned, 29

scalar multiplication of, 4 singular values of, 38 symmetric, skew symmetric, 34 transpose of, 8 triangular, 4 Matrix-variate gamma, complex, 291 Matrix-variate gamma, real, 224 Matrix-variate, rectangular, 493 Maximum likelihood estimators (MLE), 121 Maxwell-Boltzmann distribution, 499 M-convolution, 377 Method of maximum likelihood, 120 Method of moments, 120 Misclassification, 712 Mittag-Leffler function, 334 Moment generating function (mgf), 57 Multiple correlation coefficient, 364 Multivariate analysis of variance (MANOVA), 481, 759

# **N**

Null hypothesis, 396

### **O**

Oblique factors, 686 Orthogonal factors, 707

# **P**

Partial correlation, 364 Pathway models, 370 Patterned matrices, 487 Polar coordinates, 500 Poles, 322 Power of a test, 397 Power transformation, 71 Principal components, 597 Products and ratios of matrices, 366 Profiles, coincident, 814

#### Subject Index

Profiles, level, 814 Profiles, parallel, 814 Pseudo Dirichlet model, 383 *p*-value, 408

# **Q**

Quadratic form, 78 Quadratic forms, independence of, 78, 169

# **R**

Raleigh distribution, 494 Rao-Blackwell theorem, 200 Regression, 360 Regression model, 680 Relative efficiency, 199 Reliability function, 72 Repeated measures design, 824 Residues, 322

# **S**

Scalar quantity, 4 Singular gamma, 580 Singular gamma, complex, 588 Stirling's formula, 504 Student-t, 73 Sufficiency, 198

# **T**

Triangularization, 340 Type-1, type-2 error, 396

**U** Unbiasedness, 196

# **V**

Vector, row, column, 6 Vectors, orthogonal, orthonormal, 10

### **W**

Weak law of large numbers, 118 Wedge product, 40 Weyl integral, 375 Wishart density, 335

# **Z**

Zonal polynomial, 478