Odd behaviour of function cor() in R-2.10.1-64bit-Unix In a dataset with 1366 patients and 244 clinical variables Spearman's Rho was calculated for some fatty acids and BMI and came over something rather odd: R seems to calculate Rho differently on 2.10.1-64bit-Unix and 2.9.0-32bit-Windows when I calculate the complete (244x244) correlation matrix and then pick out the values I am interested in! The 2.9.0-32bit-Windows version calculates the Rho for pairwise complete observations as I expected but not so did 2.10.1-64bit-Unix. I compared 4 ways of producing the Rho: A) calculating the rho for each pair of variables in a loop -> forcing pairwise complete obs B) calculating a matrix of a small selection of variables and then picking one column of the correlation matrix C) calculating the complete 244x244 correlation matrix and then picking the relevant rho's D) as C but with 'use = "pairwise.complete.obs"' I used initialy D) and produced wrong results. I included the code and output: ________________________________ R-code ________________________________ ## Read data using UNIX-path to USB-disk data <- read.table("/media/disk/ONYG/fatty_acids/mito/BECAC_MITO_23okt09.txt" , header = TRUE , dec = "," , sep = ";") ## Read data using Windows-path to USB-disk # data <- read.table("E:/ONYG/fatty_acids/mito/BECAC_MITO_23okt09.txt" , header = TRUE , dec = "," , sep = ";") ## Usage of "use" in cor() # use: an optional character string giving a method for computing # covariances in the presence of missing values. This must be # (an abbreviation of) one of the strings '"everything"', # '"all.obs"', '"complete.obs"', '"na.or.complete"', or # '"pairwise.complete.obs"'. ## four ways calculating Spearman's Rho for a selection of variables ## (column 7 of the data relates to BMI and column 104:108 to some fatty acids) # __________________ A _____________________ cori <- array(0,151) for (i in 104:151) cori[i] <- cor( data[,7] , data[,i] , method "spearman") cor.a <- as.numeric(round(cori[104:108] , 3)) # __________________ B _____________________ cor.b <- as.numeric(round(cor( data[, c(7,104:108)] , method = "spearman") [-1,1] , 3)) # __________________ C _____________________ cor.c <- as.numeric(round(cor( data , method = "spearman") [104:108,7] , 3)) # __________________ D _____________________ cor.d <- as.numeric(round(cor( data , method = "spearman" , use "pairwise.complete.obs") [104:108,7] , 3)) ## dump the R- and OS-version and the results into textfile capture.output( {print(date()) print(as.data.frame(unlist(R.Version()))) cbind( cor.a , cor.b , cor.c , cor.d)} , file = "Cor_output.txt" , append = TRUE) _________________________________ Output _________________________________ [1] "Tue Jan 5 15:40:05 2010" unlist(R.Version()) platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 10.1 year 2009 month 12 day 14 svn rev 50720 language R version.string R version 2.10.1 (2009-12-14) The rows denote Spearman's Rho for 5 different fatty acids against BMI, the columns the 4 different ways (a,b,c,d) I calculated the Rho. cor.a cor.b cor.c cor.d [1,] 0.062 0.062 0.057 0.057 [2,] 0.107 0.107 -0.013 -0.013 [3,] 0.226 0.226 0.215 0.215 [4,] 0.232 0.232 0.157 0.157 [5,] 0.179 0.179 0.178 0.178 [1] "Tue Jan 05 15:49:34 2010" unlist(R.Version()) platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 9.0 year 2009 month 04 day 17 svn rev 48333 language R version.string R version 2.9.0 (2009-04-17) cor.a cor.b cor.c cor.d [1,] 0.062 0.062 0.062 0.062 [2,] 0.107 0.107 0.107 0.107 [3,] 0.226 0.226 0.226 0.226 [4,] 0.232 0.232 0.232 0.232 [5,] 0.179 0.179 0.179 0.179 Best regards, Reinhard -- View this message in context: http://n4.nabble.com/Unconsistent-behaviour-of-function-cor-tp999702p999702.html Sent from the R help mailing list archive at Nabble.com.