Hi everyone. Based on a dependent variable (y), I'm trying to generate some independent variables with a specified correlation. For this there's no problems. However, I would like that have all my "regressors" to be orthogonal (i.e. no correlation among them. For example, y = x1 + x2 + x3 where the correlation between y x1 = 0.7, x2 = 0.4 and x3 0.8. However, x1, x2 and x3 should not be correlated to each other. Anyone can help me? Regards, Phil -- View this message in context: http://r.789695.n4.nabble.com/Generation-of-correlated-variables-tp4475799p4475799.html Sent from the R help mailing list archive at Nabble.com.
Hello,> > However, I would like that have all my "regressors" to be orthogonal (i.e. > no correlation among them. >?poly poly(cbind(x1, x2, x3), degree=1) Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Generation-of-correlated-variables-tp4475799p4475821.html Sent from the R help mailing list archive at Nabble.com.
Hi there. This is not really working. Here what I have for the moment. library(ecodist) x <- 1:100 y1 <- corgen(x=x, r=.85, epsilon=.01)$y y2 <- corgen(x=x, r=.5, epsilon=.01)$y y3 <- corgen(x=x, r=.25, epsilon=.01)$y a = poly(cbind(y1, y2, y3), degree=1) cor(a[,1], a[,2]) In that case, the correlation between y1 and y2 is the same as a[,1] and a[,2]. Regards, Phil -- View this message in context: http://r.789695.n4.nabble.com/Generation-of-correlated-variables-tp4475799p4475891.html Sent from the R help mailing list archive at Nabble.com.
On Thu, Mar 15, 2012 at 10:48:48AM -0700, Filoche wrote:> Hi everyone. > > Based on a dependent variable (y), I'm trying to generate some independent > variables with a specified correlation. For this there's no problems. > However, I would like that have all my "regressors" to be orthogonal (i.e. > no correlation among them. > > For example, > > y = x1 + x2 + x3 where the correlation between y x1 = 0.7, x2 = 0.4 and x3 > 0.8. However, x1, x2 and x3 should not be correlated to each other.Hi. If the following computation is correct, then there is no solution for the required correlations, but there is one, if the vector of the required correlations is normalized to have sum of squares 1. Assume, variables x1, x2, x3 have mean zero and denote s1^2 = var(x1), s2^2 = var(x2), s3^2 = var(x3) and assume zero correlations among x1, x2, x3, so also zero covariances. Then var(y) = s1^2 + s2^2 + s3^2 E y x1 = E x1^2 + E x1 x2 + E x1 x3 = E x1^2 = s1^2 and similarly E y x2 = var(x2) = s2^2 E y x3 = var(x3) = s3^2 So, the correlation cor(y, x1) is s1^2/s1/sqrt(s1^2 + s2^2 + s3^2) = s1/sqrt(s1^2 + s2^2 + s3^2) Expressing all the correlations in this way, we get cor(y, x1) = s1/sqrt(s1^2 + s2^2 + s3^2) cor(y, x2) = s2/sqrt(s1^2 + s2^2 + s3^2) cor(y, x3) = s3/sqrt(s1^2 + s2^2 + s3^2) Clearly, we have cor(y, x1)^2 + cor(y, x2)^2 + cor(y, x3)^2 = 1. For your numbers, we get r <- c(0.7, 0.4, 0.8) sum(r^2) # [1] 1.29 So, for these numbers, the conditions are contradictory. However, a solution may be found for the vector of correlations r/sqrt(1.29) [1] 0.6163156 0.3521804 0.7043607 which are the original correlations normalized to have sum of squares 1. In this case, independent normal variables with the standard deviations (s1, s2, s3) == r/sqrt(1.29) will satisfy your conditions. I hope that other members of the list correct me, if i overlooked something. Petr Savicky.
Thank you everyone for your precious advice. I'll take time to look at it and try to make it work. Regards, Phil -- View this message in context: http://r.789695.n4.nabble.com/Generation-of-correlated-variables-tp4475799p4476346.html Sent from the R help mailing list archive at Nabble.com.
On 15-Mar-2012 Filoche wrote:> Hi everyone. > > Based on a dependent variable (y), I'm trying to generate some > independent variables with a specified correlation. For this > there's no problems. > However, I would like that have all my "regressors" to be > orthogonal (i.e. no correlation among them). > > For example, > > y = x1 + x2 + x3 where the correlation between y x1 = 0.7, > x2 = 0.4 and x3 = 0.8. However, x1, x2 and x3 should not be > correlated to each other. > > Anyone can help me? > > Regards, > PhilYour fundamental problem here (with the correlations you specify) is the following. Your desired correlation matrix can be constructed by C <- cbind( c(1.0,0.7,0.4,0.8),c(0.7,1.0,0.0,0.0), c(0.4,0.0,1.0,0.0),c(0.8,0.0,0.0,1.0) ) rownames(C) <- c("y","x1","x2","x3") colnames(C) <- c("y","x1","x2","x3") C # y x1 x2 x3 # y 1.0 0.7 0.4 0.8 # x1 0.7 1.0 0.0 0.0 # x2 0.4 0.0 1.0 0.0 # x3 0.8 0.0 0.0 1.0 And now: det(C) # [1] -0.29 and it is impossible for the determinant of a correlation matrix to have a negative determinant: a correlation matyrix must be positive-semidefinite, and therefore have a non-negative determinant. An alternative check is to look at the eigen-structure of C: eigen(C) # $values # [1] 2.1357817 1.0000000 1.0000000 -0.1357817 # # $vectors # [,1] [,2] [,3] [,4] # [1,] 0.7071068 0.000000e+00 0.0000000 0.7071068 # [2,] 0.4358010 -1.172802e-16 0.7874992 -0.4358010 # [3,] 0.2490291 -8.944272e-01 -0.2756247 -0.2490291 # [4,] 0.4980582 4.472136e-01 -0.5512495 -0.4980582 so one of the eigenvalues (-0.1357817) is negative, again impossible for a correlation matrix. The suggestions made by the other respondents amount to messing with the correlations so as to achieve a possible situation; but then this does not achieve you goal (which is impossible) but a different one, whose relationship with your question may or may not suit what you hope to achieve. Ted. ------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at wlandres.net> Date: 15-Mar-2012 Time: 23:23:24 This message was sent by XFMail
On Thu, Mar 15, 2012 at 10:48:48AM -0700, Filoche wrote:> Hi everyone. > > Based on a dependent variable (y), I'm trying to generate some independent > variables with a specified correlation. For this there's no problems. > However, I would like that have all my "regressors" to be orthogonal (i.e. > no correlation among them. > > For example, > > y = x1 + x2 + x3 where the correlation between y x1 = 0.7, x2 = 0.4 and x3 > 0.8. However, x1, x2 and x3 should not be correlated to each other.Hi. The previous discussion shows that the required correlations should have sum of squares 1. It is not clear, whether your expected application allows to normalize the correlations to satisfy this condition. If this normalization is possible, then try the following approach. It creates a matrix trans, which satisfies the following. If Z is a matrix whose columns have zero mean and the identity covariance matrix, then Z %*% trans has the required correlation structure. This may be checked by looking at cov2cor(t(trans) %*% trans). Moreover, trans should have the first column c(1, 0, 0) so that the first column of Z is not changed by the transformation. # normalize the correlations (see the discussion above) rho <- c(0.7, 0.4, 0.8) rho <- rho/sqrt(sum(rho^2)) (required <- cbind(c(1, rho), rbind(c(rho), diag(3)))) trans <- cbind( y=rho, x1=c(rho[1], 0, 0), x2=c(0, rho[2], 0), x3=c(0, 0, rho[3])) # check that trans generates the required correlations max(abs(cov2cor(t(trans) %*% trans) - required)) # [1] 1.110223e-16 # get orthonormal columns gram.schmidt <- function(a) { n <- ncol(a) for (i in seq.int(length=n)) { for (j in seq.int(length=i-1)) { a[, i] <- a[, i] - sum(a[, j]*a[, i])*a[, j] } a[, i] <- a[, i]/sqrt(sum(a[, i]^2)) } a } # multiply trans by a unitary matrix, so that the first column becomes (1, 0, 0) U <- gram.schmidt(trans[, 1:3]) trans <- t(U) %*% trans # check that trans still generates the required correlations max(abs(cov2cor(t(trans) %*% trans) - required)) # [1] 1.110223e-16 # choose an example of the real vector y and normalize y <- rnorm(100) ynorm <- y - mean(y) ynorm <- ynorm/sqrt(sum(ynorm^2)) # find two more normalized vectors orthogonal to ynorm u <- gram.schmidt(cbind(1, ynorm, rnorm(length(y)), rnorm(length(y))))[, 3:4] Z <- cbind(ynorm, u) # get the solution as suggested at the beginning S <- Z %*% trans # check that S is a solution for the normalized y (ynorm) max(abs(cor(S) - required)) # [1] 1.110223e-16 max(abs(S[, 1] - ynorm)) # [1] 8.326673e-17 max(abs(S[, 2] + S[, 3] + S[, 4] - ynorm)) # [1] 1.110223e-16 This solution contains "ynorm". In order to get "y" as required, a linear transformation, which is inverse to the normalization, should be done. Since it is linear, it does not change the correlations. Hope this helps. Petr Savicky.