One of my colleagues has implemented a new method for evaluating the cumulative distribution function of a multivariate normal distribution and wanted to compare the speed of his method with that of randomized quasi-Monte Carlo methods. A few days ago, while we were going to lunch, he asked me how to generate random correlation matrices, because the speed of his method strongly depends on the correlation matrix and he wanted to have some sort of average.
But what is a random correlation matrix?
Let's first give a characterization of correlation matrices.
It is well known that for a matrix
there exist (multivariate normal distributed) random variables
with
if and only if
-
for all
,
for all
,
is symmetric (therefore all eigenvalues
of
are real)
- and all eigenvalues of
are greater or equal to zero.
But what is the right notion of randomness for these matrices? For example let's look at the orthogonal matrices. In many numerical applications we need uniformly distributed random orthogonal matrices in terms of the Haar measure.
Unfortunately in our case there is no clear, natural notion of randomness.
Method 1 - Try and Error: We generate a matrix fulfilling no. 1, 2 and 3 of the characterization (these matrices are called pseudo correlation matrices) by generating independent pseudo-random numbers uniformly distributed between -1 and 1 for the entries
,
.
If this random symmetric matrix is positive semidefinite (i.e. all eigenvalues of
are greater or equal to zero), we have the desired result. Otherwise we try again. Here is the corresponding R code:
random.pseudo.correlation.matrix <-
function(n) {
a <- diag(n)
for(i in 1:(n-1)) {
for(j in (i+1):n) {
a[i,j] <- a[j,i] <- runif(1,-1,1)
}
}
return(a)
}
random.correlation.matrix.try.and.error <-
function(n) {
repeat {
a <- random.pseudo.correlation.matrix(n)
if (min(eigen(a)$values)>=0) return(a)
}
}
This approach is only reasonable for very small dimensions (try it with
).
Method 2 - Lift the Diagonal:
We denote by
the identity matrix. If
has the eigenvalues
then
has the eigenvalues
since
is a solution of
if
and only if
is a solution of
.
So we start again with a pseudo correlation matrix
,
but instead of retrying if
has negative eigen values, we lift the diagonal by
and obtain
, which is always positive semidefinite. After dividing by
we have a correlation matrix which is ``some kind of random''.
Unfortunately the diagonal is accentuated and the smallest eigen value is always zero. The second problem we could avoid by adding
where
is some random number, but the first remains.
make.positive.semi.definite <-
function(a, offset=0) {
(a + (diag(dim(a)[1]) * (abs(min(eigen(a)$values))+offset))) /
(1+(abs(min(eigen(a)$values))+offset))
}
random.correlation.matrix.lift.diagonal <-
function(n, offset=0) {
a <- random.pseudo.correlation.matrix(n)
make.positive.semi.definite(offset)
}
Method 3 - Gramian matrix - my favorite: Holmes [Holmes1991] discusses two principal methods for generating random correlation matrices.
One of them is to generate
independent pseudo-random vectors
distributed uniformly on the unit sphere
in
and to use the Gram matrix
, where
has
as
-th column and
is the transpose of
.
To create the
in R, we load the package mvtnorm, generate
and set
:
random.correlation.matrix.sphere <-
function(n) {
require("mvtnorm")
t <- rmvnorm(n,rep(0,n),diag(n))
for (i in 1:n) {
t[i,] <- t[i,]/sqrt(t(t[i,])%*%t[i,])
}
t%*%t(t)
}
Conclusion: There are futher methods (like e.g. to generate a random spectrum and then construct the correlation matrix), which are not as easy to implement as the ones described above. But just as the three given methods, they all are unsatisfactory in some way because we don't really know how random correlation matrices should be distributed.
For my colleague an average of calculation time does only make sense when he knows which kind of correlation matrices occurs in the applications. He decided to describe and compare the different cases individually.
But does it perhaps make sense to use random correlation matrices as test cases or are the special cases more important? For example, random correlation matrices generated with method 1 and 3 are only singular with probability zero.
Any critique, comments, suggestions or questions are welcome!
And for the next time: Given a correlation matrix C. How do we generate tuples of pseudo-random numbers following a given multivariate distribution with correlation matrix C?
Bibliography
- Holmes1991
-
Holmes, R. B. 1991.
On random correlation matrices.
Siam J. Matrix Anal. Appl., Vol. 12 No. 2: 239-272.
Tags: Mathe, Programmieren, R, Statistik
[...] musste ich feststellen, dass die besagten LaTeX2HTML-Beiträge wie “Random Correlation Matrices” wieder durch <br/> zerschossen wurden: <IMG<br />rn WIDTH="183" [...]