thr3ads.net - R help - [R] Generation of correlated variables [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Filoche

2012-Mar-15 17:48 UTC

[R] Generation of correlated variables

Hi everyone.

Based on a dependent variable (y), I'm trying to generate some independent
variables with a specified correlation. For this there's no problems.
However, I would like that have all my "regressors" to be orthogonal
(i.e.
no correlation among them.

For example, 

y = x1 + x2 + x3 where the correlation between y x1 = 0.7, x2 = 0.4 and x3 0.8. 
However, x1, x2 and x3 should not be correlated to each other.

Anyone can help me?

Regards,
Phil

--
View this message in context:
http://r.789695.n4.nabble.com/Generation-of-correlated-variables-tp4475799p4475799.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

2012-Mar-15 17:54 UTC

head link

[R] Generation of correlated variables

Hello,
>
> However, I would like that have all my "regressors" to be
orthogonal (i.e.
> no correlation among them.
>
?poly
poly(cbind(x1, x2, x3), degree=1)

Hope this helps,

Rui Barradas


--
View this message in context:
http://r.789695.n4.nabble.com/Generation-of-correlated-variables-tp4475799p4475821.html
Sent from the R help mailing list archive at Nabble.com.

Filoche

2012-Mar-15 18:14 UTC

head link

[R] Generation of correlated variables

Hi there.

This is not really working. Here what I have for the moment.

library(ecodist)

x <- 1:100
y1 <- corgen(x=x, r=.85, epsilon=.01)$y
y2 <- corgen(x=x, r=.5, epsilon=.01)$y
y3 <- corgen(x=x, r=.25, epsilon=.01)$y

a = poly(cbind(y1, y2, y3), degree=1) 
cor(a[,1], a[,2])

In that case, the correlation between y1 and y2 is the same as a[,1] and
a[,2]. 

Regards,
Phil



--
View this message in context:
http://r.789695.n4.nabble.com/Generation-of-correlated-variables-tp4475799p4475891.html
Sent from the R help mailing list archive at Nabble.com.

Petr Savicky

2012-Mar-15 19:01 UTC

head link

[R] Generation of correlated variables

On Thu, Mar 15, 2012 at 10:48:48AM -0700, Filoche wrote:> Hi everyone.
> 
> Based on a dependent variable (y), I'm trying to generate some
independent
> variables with a specified correlation. For this there's no problems.
> However, I would like that have all my "regressors" to be
orthogonal (i.e.
> no correlation among them.
> 
> For example, 
> 
> y = x1 + x2 + x3 where the correlation between y x1 = 0.7, x2 = 0.4 and x3
> 0.8.  However, x1, x2 and x3 should not be correlated to each other.
Hi.

If the following computation is correct, then there is no solution
for the required correlations, but there is one, if the vector of
the required correlations is normalized to have sum of squares 1.

Assume, variables x1, x2, x3 have mean zero and denote s1^2 = var(x1),
s2^2 = var(x2), s3^2 = var(x3) and assume zero correlations among 
x1, x2, x3, so also zero covariances. Then

  var(y) = s1^2 + s2^2 + s3^2
  E y x1 = E x1^2 + E x1 x2 + E x1 x3 = E x1^2 = s1^2

and similarly

  E y x2 = var(x2) = s2^2
  E y x3 = var(x3) = s3^2

So, the correlation cor(y, x1) is

  s1^2/s1/sqrt(s1^2 + s2^2 + s3^2) = s1/sqrt(s1^2 + s2^2 + s3^2)

Expressing all the correlations in this way, we get

  cor(y, x1) = s1/sqrt(s1^2 + s2^2 + s3^2)
  cor(y, x2) = s2/sqrt(s1^2 + s2^2 + s3^2)
  cor(y, x3) = s3/sqrt(s1^2 + s2^2 + s3^2)

Clearly, we have cor(y, x1)^2 + cor(y, x2)^2 + cor(y, x3)^2 = 1.
For your numbers, we get

  r <- c(0.7, 0.4, 0.8)
  sum(r^2) # [1] 1.29

So, for these numbers, the conditions are contradictory. However,
a solution may be found for the vector of correlations

  r/sqrt(1.29)
  [1] 0.6163156 0.3521804 0.7043607

which are the original correlations normalized to have sum of
squares 1. In this case, independent normal variables with the
standard deviations (s1, s2, s3) == r/sqrt(1.29) will satisfy
your conditions.

I hope that other members of the list correct me, if i overlooked
something.

Petr Savicky.

Filoche

2012-Mar-15 20:38 UTC

head link

[R] Generation of correlated variables

Thank you everyone for your precious advice.

I'll take time to look at it and try to make it work. 


Regards,
Phil



--
View this message in context:
http://r.789695.n4.nabble.com/Generation-of-correlated-variables-tp4475799p4476346.html
Sent from the R help mailing list archive at Nabble.com.

(Ted Harding)

2012-Mar-15 23:23 UTC

head link

[R] Generation of correlated variables

On 15-Mar-2012 Filoche wrote:> Hi everyone.
> 
> Based on a dependent variable (y), I'm trying to generate some
> independent variables with a specified correlation. For this
> there's no problems.
> However, I would like that have all my "regressors" to be
> orthogonal (i.e. no correlation among them).
> 
> For example, 
> 
> y = x1 + x2 + x3 where the correlation between y x1 = 0.7,
> x2 = 0.4 and x3 = 0.8.  However, x1, x2 and x3 should not be
> correlated to each other.
> 
> Anyone can help me?
> 
> Regards,
> Phil
Your fundamental problem here (with the correlations you specify)
is the following.

Your desired correlation matrix can be constructed by

  C <- cbind( c(1.0,0.7,0.4,0.8),c(0.7,1.0,0.0,0.0),
              c(0.4,0.0,1.0,0.0),c(0.8,0.0,0.0,1.0) )
  rownames(C) <-
c("y","x1","x2","x3")
  colnames(C) <-
c("y","x1","x2","x3")

  C
  #      y  x1  x2  x3
  # y  1.0 0.7 0.4 0.8
  # x1 0.7 1.0 0.0 0.0
  # x2 0.4 0.0 1.0 0.0
  # x3 0.8 0.0 0.0 1.0

And now:

  det(C)
  # [1] -0.29

and it is impossible for the determinant of a correlation
matrix to have a negative determinant: a correlation matyrix
must be positive-semidefinite, and therefore have a non-negative
determinant.

An alternative check is to look at the eigen-structure of C:

  eigen(C)
  # $values
  # [1]  2.1357817  1.0000000  1.0000000 -0.1357817
  # 
  # $vectors
  #           [,1]          [,2]       [,3]       [,4]
  # [1,] 0.7071068  0.000000e+00  0.0000000  0.7071068
  # [2,] 0.4358010 -1.172802e-16  0.7874992 -0.4358010
  # [3,] 0.2490291 -8.944272e-01 -0.2756247 -0.2490291
  # [4,] 0.4980582  4.472136e-01 -0.5512495 -0.4980582

so one of the eigenvalues (-0.1357817) is negative, again
impossible for a correlation matrix.

The suggestions made by the other respondents amount to
messing with the correlations so as to achieve a possible
situation; but then this does not achieve you goal (which
is impossible) but a different one, whose relationship
with your question may or may not suit what you hope to
achieve.

Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 15-Mar-2012  Time: 23:23:24
This message was sent by XFMail

Petr Savicky

2012-Mar-16 12:33 UTC

head link

[R] Generation of correlated variables

On Thu, Mar 15, 2012 at 10:48:48AM -0700, Filoche wrote:> Hi everyone.
> 
> Based on a dependent variable (y), I'm trying to generate some
independent
> variables with a specified correlation. For this there's no problems.
> However, I would like that have all my "regressors" to be
orthogonal (i.e.
> no correlation among them.
> 
> For example, 
> 
> y = x1 + x2 + x3 where the correlation between y x1 = 0.7, x2 = 0.4 and x3
> 0.8.  However, x1, x2 and x3 should not be correlated to each other.
Hi.

The previous discussion shows that the required correlations
should have sum of squares 1. It is not clear, whether your
expected application allows to normalize the correlations to
satisfy this condition. If this normalization is possible, then
try the following approach. 

It creates a matrix trans, which satisfies the following.
If Z is a matrix whose columns have zero mean and the identity covariance
matrix, then Z %*% trans has the required correlation structure.
This may be checked by looking at cov2cor(t(trans) %*% trans).

Moreover, trans should have the first column c(1, 0, 0) so that
the first column of Z is not changed by the transformation.

  # normalize the correlations (see the discussion above)
  rho <- c(0.7, 0.4, 0.8)
  rho <- rho/sqrt(sum(rho^2))

  (required <- cbind(c(1, rho), rbind(c(rho), diag(3))))

  trans <- cbind(
  y=rho,
  x1=c(rho[1], 0, 0),
  x2=c(0, rho[2], 0),
  x3=c(0, 0, rho[3]))

  # check that trans generates the required correlations
  max(abs(cov2cor(t(trans) %*% trans) - required)) # [1] 1.110223e-16

  # get orthonormal columns
  gram.schmidt <- function(a)
  {
      n <- ncol(a)
      for (i in seq.int(length=n)) {
          for (j in seq.int(length=i-1)) {
              a[, i] <- a[, i] - sum(a[, j]*a[, i])*a[, j]
          }
          a[, i] <- a[, i]/sqrt(sum(a[, i]^2))
      }
      a
  }

  # multiply trans by a unitary matrix, so that the first column becomes (1, 0,
0)
  U <- gram.schmidt(trans[, 1:3])
  trans <- t(U) %*% trans

  # check that trans still generates the required correlations
  max(abs(cov2cor(t(trans) %*% trans) - required)) # [1] 1.110223e-16

  # choose an example of the real vector y and normalize
  y <- rnorm(100)
  ynorm <- y - mean(y)
  ynorm <- ynorm/sqrt(sum(ynorm^2))

  # find two more normalized vectors orthogonal to ynorm
  u <- gram.schmidt(cbind(1, ynorm, rnorm(length(y)), rnorm(length(y))))[,
3:4]
  Z <- cbind(ynorm, u) 

  # get the solution as suggested at the beginning
  S <- Z %*% trans
 
  # check that S is a solution for the normalized y (ynorm)
  max(abs(cor(S) - required)) # [1] 1.110223e-16
  max(abs(S[, 1] - ynorm)) # [1] 8.326673e-17
  max(abs(S[, 2] + S[, 3] + S[, 4] - ynorm)) # [1] 1.110223e-16

This solution contains "ynorm". In order to get "y" as
required, a
linear transformation, which is inverse to the normalization, should
be done. Since it is linear, it does not change the correlations.

Hope this helps.

Petr Savicky.

Maybe Matching Threads

Search for more maybe matching threads

R help - Mar 2012 - Generation of correlated variables

[R] Generation of correlated variables

[R] Generation of correlated variables

[R] Generation of correlated variables

[R] Generation of correlated variables

[R] Generation of correlated variables

[R] Generation of correlated variables

[R] Generation of correlated variables

Maybe Matching Threads