thr3ads.net - R help - [R] A basic statistics question [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Ron Michael

2014-Aug-12 19:57 UTC

[R] A basic statistics question

Hi,

I would need to get a clarification on a quite fundamental statistics property,
hope expeRts here would not mind if I post that here.

I leant that variance-covariance matrix of the standardized data is equal to the
correlation matrix for the unstandardized data. So I used following data.

Data <- structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L,  7L,
6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L,  0L, 10L, 10L, 10L,
7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L,  0L, 13L, 13L, 10L, 7L, 7L, 7L,
10L, 7L, 5L, 8L, 7L, 10L, 10L,  10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L, 7L, 7L, 7L, 8L,
7L, 8L, 6L,  6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L, 12L, 8L, 5L, 
5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L, 6L,  7L, 5L, 7L, 5L,
7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L,  6L, 4L, 4L, 6L, 6L, 7L, 8L, 7L,
11L, 10L, 8L, 7L, 6L, 6L, 11L,  5L, 4L, 6L, 6L, 6L, 7L, 8L, 7L, 12L, 4L, 4L, 2L,
5L, 6L, 7L,  6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L, 5L, 6L, 6L, 5L, 5L, 6L, 6L, 
4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L, 4L, 6L, 6L,  6L, 8L, 8L, 8L,
7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L,  3L, 4L, 5L, 5L, 6L, 5L, 5L, 6L,
4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L,  9L, 10L, 8L, 9L, 8L, 9L, 8L, 8L, 9L, 11L, 10L,
9L, 9L, 13L,
 13L,  10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L,  3L, 7L,
6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L,  10L, 8L, 8L, 9L, 9L,
11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L,  2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 1L,
1L, 1L, 3L, 3L, 4L, 6L, 4L,  5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L, 2L, 7L,
3L, 3L, 10L,  13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L, 3L, 3L, 
0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L,  2L, 2L, 5L, 5L,
5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L,  4L, 4L, 8L, 7L, 10L, 3L, 1L, 9L,
5L, 11L, 9L), .Dim = c(45L,  8L), .Dimnames = list(NULL, c("V1",
"V7", "V13", "V19", "V25", 
"V31", "V37", "V43")))

????	
Data_Normalized <- apply(Data, 2, function(x) return((x - mean(x))/sd(x))) 

(t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]



Point is that I am not getting exact?CORR matrix. Can somebody point me what I
am missing here?

Thanks for your pointer.?

(Ted Harding)

2014-Aug-12 21:32 UTC

head link

[R] A basic statistics question

On 12-Aug-2014 19:57:29 Ron Michael wrote:> Hi,
> 
> I would need to get a clarification on a quite fundamental statistics
> property, hope expeRts here would not mind if I post that here.
> 
> I leant that variance-covariance matrix of the standardized data is equal
to
> the correlation matrix for the unstandardized data. So I used following
data.
> 
> Data <- structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L, 
7L,
> 6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L,  0L, 10L, 10L,
> 10L, 7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L,  0L, 13L, 13L, 10L, 7L,
> 7L, 7L, 10L, 7L, 5L, 8L, 7L, 10L, 10L,  10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L,
7L,
> 7L, 7L, 8L, 7L, 8L, 6L,  6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L,
> 12L, 8L, 5L,  5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L,
6L,
> 7L, 5L, 7L, 5L, 7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L,  6L, 4L,
4L,
> 6L, 6L, 7L, 8L, 7L, 11L, 10L, 8L, 7L, 6L, 6L, 11L,  5L, 4L, 6L, 6L, 6L, 7L,
> 8L, 7L, 12L, 4L, 4L, 2L, 5L, 6L, 7L,  6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L,
5L,
> 6L, 6L, 5L, 5L, 6L, 6L,  4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L,
> 4L, 6L, 6L,  6L, 8L, 8L, 8L, 7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L, 
> 3L, 4L, 5L, 5L, 6L, 5L, 5L, 6L, 4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L,  9L, 10L,
8L,
> 9L, 8L, 9L, 8L, 8L, 9L, 11L, 10L, 9L, 9L, 13L,
>  13L,  10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L,  3L,
7L,
> 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L,  10L, 8L, 8L, 9L,
9L,
> 11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L,  2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L,
1L,
> 1L, 1L, 3L, 3L, 4L, 6L, 4L,  5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L,
2L,
> 7L, 3L, 3L, 10L,  13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L,
3L,
> 3L,  0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L,  2L, 2L,
> 5L, 5L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L,  4L, 4L, 8L, 7L, 10L,
> 3L, 1L, 9L, 5L, 11L, 9L), .Dim = c(45L,  8L), .Dimnames = list(NULL,
c("V1",
> "V7", "V13", "V19", "V25", 
"V31", "V37", "V43")))
> 
> ____  
> Data_Normalized <- apply(Data, 2, function(x) return((x -
mean(x))/sd(x)))
> 
> (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]
> 
> 
> 
> Point is that I am not getting exact CORR matrix. Can somebody point me
> what I am missing here?
> 
> Thanks for your pointer.
Try:
  Data_Normalized <- apply(Data, 2, function(x) return((x - mean(x))/sd(x)))
  (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)

and compare the result with

  cor(Data)

And why? Look at

  ?sd

and note that:

  Details:
     Like 'var' this uses denominator n - 1.

Hoping this helps,
Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 12-Aug-2014  Time: 22:32:26
This message was sent by XFMail

Rolf Turner

2014-Aug-12 21:41 UTC

head link

[R] A basic statistics question

On 13/08/14 07:57, Ron Michael wrote:> Hi,
>
> I would need to get a clarification on a quite fundamental statistics
property, hope expeRts here would not mind if I post that here.
>
> I leant that variance-covariance matrix of the standardized data is equal
to the correlation matrix for the unstandardized data. So I used following data.
<SNIP>
> (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]
>
>
>
> Point is that I am not getting exact CORR matrix. Can somebody point me
what I am missing here?
You are using a denominator of "n" in calculating your
"covariance"
matrix for your normalized data.  But these data were normalized using 
the sd() function which (correctly) uses a denominator of n-1 so as to 
obtain an unbiased estimator of the population standard deviation.

If you calculated

    (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)

then you would get the same result as you get from cor(Data) (to within 
about 1e-15).

cheers,

Rolf Turner

-- 
Rolf Turner
Technical Editor ANZJS

R help - Aug 2014 - A basic statistics question

[R] A basic statistics question

[R] A basic statistics question

[R] A basic statistics question