thr3ads.net - R help - [R] Why are lagged correlations typically negative? [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Bliese, Paul D LTC USAMH

2006-Aug-24 15:06 UTC

[R] Why are lagged correlations typically negative?

Recently, I was working with some lagged designs where a vector of
observations at one time was used to predict a vector of observations at
another time using a lag 1 design.  In the work, I noticed a lot of
negative correlations, so I ran a simple simulation with 2 matched
points.  The crude simulation example below shows that the correlation
can be -1 or +1, but interestingly if you do this basic simulation
thousands of times, you get negative correlations 66 to 67% of the time.
If you simulate three matched observations instead of three you get
negative correlations about 74% of the time and then as you simulate 4
and more observations the number of negative correlations asymptotically
approaches an equal 50% for negative versus positive correlations
(though then with 100 observations one has 54% negative correlations).
Creating T1 and T2 so they are related (and not correlated 1 as in the
crude simulation) attenuates the effect.  A more advanced simulation is
provided below for those interested.

Can anyone explain why this occurs in a way a non-mathematician is
likely to understand?

Thanks,

Paul

#############
# Crude simulation
#############> (T1<-rnorm(3))
[1] -0.1594703 -1.3340677  0.2924988> (T2<-c(T1[2:3],NA))
[1] -1.3340677  0.2924988         NA> cor(T1,T2, use="complete")[1] -1
> (T1<-rnorm(3))
[1] -0.84258593 -0.49161602  0.03805543> (T2<-c(T1[2:3],NA))
[1] -0.49161602  0.03805543          NA> cor(T1,T2, use="complete")[1] 1

###########
# More advanced simulation example
###########> lagsfunction(nobs,nreps,rho=1){
OUT<-data.frame(NEG=rep(NA,nreps),COR=rep(NA,nreps))
nran<-nobs+1  #need to generate 1 more random number than there are
observations
  for(i in 1:nreps){
      V1<-rnorm(nran)
      V2<-sqrt(1-rho^2)*rnorm(nran)+rho*V1
      #print(cor(V1,V2))
      V1<-V1[1:nran-1]
      V2<-V2[2:nran]
      OUT[i,1]<-ifelse(cor(V1,V2)<=0,1,0)
      OUT[i,2]<-cor(V1,V2)
  }
return(OUT) #out is a 1 if the corr is negative or 0; 0 if positive
}> LAGS.2<-lags(2,10000)  #Number of observations matched = 2
> mean(LAGS.2)    NEG     COR 
 0.6682 -0.3364

Thomas Lumley

2006-Aug-24 15:27 UTC

head link

[R] Why are lagged correlations typically negative?

On Thu, 24 Aug 2006, Bliese, Paul D LTC USAMH wrote:
> Recently, I was working with some lagged designs where a vector of
> observations at one time was used to predict a vector of observations at
> another time using a lag 1 design.  In the work, I noticed a lot of
> negative correlations, so I ran a simple simulation with 2 matched
> points.  The crude simulation example below shows that the correlation
> can be -1 or +1, but interestingly if you do this basic simulation
> thousands of times, you get negative correlations 66 to 67% of the time.
> If you simulate three matched observations instead of three you get
> negative correlations about 74% of the time and then as you simulate 4
> and more observations the number of negative correlations asymptotically
> approaches an equal 50% for negative versus positive correlations
> (though then with 100 observations one has 54% negative correlations).
> Creating T1 and T2 so they are related (and not correlated 1 as in the
> crude simulation) attenuates the effect.  A more advanced simulation is
> provided below for those interested.
>
> Can anyone explain why this occurs in a way a non-mathematician is
> likely to understand?
Consider the two points out of three case from the viewpoint of the middle 
point.  The correlation is positive if the previous point is lower and the 
following point is higher, or vice versa. It is negative if the previous 
and following points are both higher or both lower.

Now, if the middle point is higher than the first point it is probably 
higher than average, and so it has a more than 50% chance of also being 
higher than the third point.  Similarly, if it is lower than the first 
point it is likely to be lower than the third point.

So negative correlation is more likely than positive.

Working out the covariance may be useful even for non-mathematicians. Call 
the three points X,Y,Z

   cov(X-Y, Y-Z) = cov(X,Y)-cov(Y,Y)-cov(X,Z)+cov(Y,Z)
                 =    0    - var(Y) -    0   -    0

 	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

Gabor Grothendieck

2006-Aug-24 16:02 UTC

head link

[R] Why are lagged correlations typically negative?

The covariance has the same sign as the
correlation so lets calculate the sample covariance
of the vector T1 = (X,Y) with T2 = (Y,Z) where we ignored
the third component in each case due to use="complete".

	cov(T1, T2) = XY + YZ - (X+Y)/2 * (Y+Z)/2

X, Y and Z are random variables so we take the
expectation to get the overall average over many
runs.  Expectation is linear and all the random
variables are uncorrelated so:

	EXY + EYZ - E[(X+Y)/2 * (Y+Z)/2]
	= EXY + EYZ - EXY/4 - EXZ/4 - EYY/4 - EYZ/4
	= -EYY/4
	< 0

where the third line is due to the fact that
all terms in the second line except the surviving
term are zero.


On 8/24/06, Bliese, Paul D LTC USAMH <paul.bliese at us.army.mil>
wrote:> Recently, I was working with some lagged designs where a vector of
> observations at one time was used to predict a vector of observations at
> another time using a lag 1 design.  In the work, I noticed a lot of
> negative correlations, so I ran a simple simulation with 2 matched
> points.  The crude simulation example below shows that the correlation
> can be -1 or +1, but interestingly if you do this basic simulation
> thousands of times, you get negative correlations 66 to 67% of the time.
> If you simulate three matched observations instead of three you get
> negative correlations about 74% of the time and then as you simulate 4
> and more observations the number of negative correlations asymptotically
> approaches an equal 50% for negative versus positive correlations
> (though then with 100 observations one has 54% negative correlations).
> Creating T1 and T2 so they are related (and not correlated 1 as in the
> crude simulation) attenuates the effect.  A more advanced simulation is
> provided below for those interested.
>
> Can anyone explain why this occurs in a way a non-mathematician is
> likely to understand?
>
> Thanks,
>
> Paul
>
> #############
> # Crude simulation
> #############
> > (T1<-rnorm(3))
> [1] -0.1594703 -1.3340677  0.2924988
> > (T2<-c(T1[2:3],NA))
> [1] -1.3340677  0.2924988         NA
> > cor(T1,T2, use="complete")
> [1] -1
>
> > (T1<-rnorm(3))
> [1] -0.84258593 -0.49161602  0.03805543
> > (T2<-c(T1[2:3],NA))
> [1] -0.49161602  0.03805543          NA
> > cor(T1,T2, use="complete")
> [1] 1
>
> ###########
> # More advanced simulation example
> ###########
> > lags
> function(nobs,nreps,rho=1){
> OUT<-data.frame(NEG=rep(NA,nreps),COR=rep(NA,nreps))
> nran<-nobs+1  #need to generate 1 more random number than there are
> observations
>  for(i in 1:nreps){
>      V1<-rnorm(nran)
>      V2<-sqrt(1-rho^2)*rnorm(nran)+rho*V1
>      #print(cor(V1,V2))
>      V1<-V1[1:nran-1]
>      V2<-V2[2:nran]
>      OUT[i,1]<-ifelse(cor(V1,V2)<=0,1,0)
>      OUT[i,2]<-cor(V1,V2)
>  }
> return(OUT) #out is a 1 if the corr is negative or 0; 0 if positive
> }
> > LAGS.2<-lags(2,10000)  #Number of observations matched = 2
> > mean(LAGS.2)
>    NEG     COR
>  0.6682 -0.3364
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Aug 2006 - Why are lagged correlations typically negative?

[R] Why are lagged correlations typically negative?

[R] Why are lagged correlations typically negative?

[R] Why are lagged correlations typically negative?

Possibly Parallel Threads