Hi, I would need to get a clarification on a quite fundamental statistics property, hope expeRts here would not mind if I post that here. I leant that variance-covariance matrix of the standardized data is equal to the correlation matrix for the unstandardized data. So I used following data. Data <- structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L, 7L, 6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L, 0L, 10L, 10L, 10L, 7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L, 0L, 13L, 13L, 10L, 7L, 7L, 7L, 10L, 7L, 5L, 8L, 7L, 10L, 10L, 10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L, 7L, 7L, 7L, 8L, 7L, 8L, 6L, 6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L, 12L, 8L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L, 6L, 7L, 5L, 7L, 5L, 7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L, 6L, 4L, 4L, 6L, 6L, 7L, 8L, 7L, 11L, 10L, 8L, 7L, 6L, 6L, 11L, 5L, 4L, 6L, 6L, 6L, 7L, 8L, 7L, 12L, 4L, 4L, 2L, 5L, 6L, 7L, 6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L, 5L, 6L, 6L, 5L, 5L, 6L, 6L, 4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L, 4L, 6L, 6L, 6L, 8L, 8L, 8L, 7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L, 3L, 4L, 5L, 5L, 6L, 5L, 5L, 6L, 4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L, 9L, 10L, 8L, 9L, 8L, 9L, 8L, 8L, 9L, 11L, 10L, 9L, 9L, 13L, 13L, 10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L, 3L, 7L, 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L, 10L, 8L, 8L, 9L, 9L, 11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L, 2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 4L, 6L, 4L, 5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L, 2L, 7L, 3L, 3L, 10L, 13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L, 3L, 3L, 0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L, 2L, 2L, 5L, 5L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L, 4L, 4L, 8L, 7L, 10L, 3L, 1L, 9L, 5L, 11L, 9L), .Dim = c(45L, 8L), .Dimnames = list(NULL, c("V1", "V7", "V13", "V19", "V25", "V31", "V37", "V43"))) ???? Data_Normalized <- apply(Data, 2, function(x) return((x - mean(x))/sd(x))) (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] Point is that I am not getting exact?CORR matrix. Can somebody point me what I am missing here? Thanks for your pointer.?
On 12-Aug-2014 19:57:29 Ron Michael wrote:> Hi, > > I would need to get a clarification on a quite fundamental statistics > property, hope expeRts here would not mind if I post that here. > > I leant that variance-covariance matrix of the standardized data is equal to > the correlation matrix for the unstandardized data. So I used following data. > > Data <- structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L, 7L, > 6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L, 0L, 10L, 10L, > 10L, 7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L, 0L, 13L, 13L, 10L, 7L, > 7L, 7L, 10L, 7L, 5L, 8L, 7L, 10L, 10L, 10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L, 7L, > 7L, 7L, 8L, 7L, 8L, 6L, 6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L, > 12L, 8L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L, 6L, > 7L, 5L, 7L, 5L, 7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L, 6L, 4L, 4L, > 6L, 6L, 7L, 8L, 7L, 11L, 10L, 8L, 7L, 6L, 6L, 11L, 5L, 4L, 6L, 6L, 6L, 7L, > 8L, 7L, 12L, 4L, 4L, 2L, 5L, 6L, 7L, 6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L, 5L, > 6L, 6L, 5L, 5L, 6L, 6L, 4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L, > 4L, 6L, 6L, 6L, 8L, 8L, 8L, 7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L, > 3L, 4L, 5L, 5L, 6L, 5L, 5L, 6L, 4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L, 9L, 10L, 8L, > 9L, 8L, 9L, 8L, 8L, 9L, 11L, 10L, 9L, 9L, 13L, > 13L, 10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L, 3L, 7L, > 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L, 10L, 8L, 8L, 9L, 9L, > 11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L, 2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 1L, > 1L, 1L, 3L, 3L, 4L, 6L, 4L, 5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L, 2L, > 7L, 3L, 3L, 10L, 13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L, 3L, > 3L, 0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L, 2L, 2L, > 5L, 5L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L, 4L, 4L, 8L, 7L, 10L, > 3L, 1L, 9L, 5L, 11L, 9L), .Dim = c(45L, 8L), .Dimnames = list(NULL, c("V1", > "V7", "V13", "V19", "V25", "V31", "V37", "V43"))) > > ____ > Data_Normalized <- apply(Data, 2, function(x) return((x - mean(x))/sd(x))) > > (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] > > > > Point is that I am not getting exact CORR matrix. Can somebody point me > what I am missing here? > > Thanks for your pointer.Try: Data_Normalized <- apply(Data, 2, function(x) return((x - mean(x))/sd(x))) (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1) and compare the result with cor(Data) And why? Look at ?sd and note that: Details: Like 'var' this uses denominator n - 1. Hoping this helps, Ted. ------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at wlandres.net> Date: 12-Aug-2014 Time: 22:32:26 This message was sent by XFMail
On 13/08/14 07:57, Ron Michael wrote:> Hi, > > I would need to get a clarification on a quite fundamental statistics property, hope expeRts here would not mind if I post that here. > > I leant that variance-covariance matrix of the standardized data is equal to the correlation matrix for the unstandardized data. So I used following data.<SNIP>> (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] > > > > Point is that I am not getting exact CORR matrix. Can somebody point me what I am missing here?You are using a denominator of "n" in calculating your "covariance" matrix for your normalized data. But these data were normalized using the sd() function which (correctly) uses a denominator of n-1 so as to obtain an unbiased estimator of the population standard deviation. If you calculated (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1) then you would get the same result as you get from cor(Data) (to within about 1e-15). cheers, Rolf Turner -- Rolf Turner Technical Editor ANZJS