Hi everybody, I have a small question about R. I'm doing some correlation matrices between my files. These files contains each 4 columns of data. These data files contains missing data too. It could happen sometimes that in one file, one of the 4 columns contains only missing data NA. As I'm doing correlations between the same columns of each files, I get a correlation matrix with a column containing only NAs such like this: file1 file 2 file 3 file1 1 NA 0.8 file2 NA 1 NA file3 0.8 NA 1 For file2, I have no correlation coefficient. My function is looking for the highest correlation coefficient for each file. But I have an error message due to this. My question is: how can I say to the function: don't do any calculation if you see only NAs for the file you're working on? The aim of this function is to automatize this calculation for 300 files. I tried by adding: na.rm=TRUE, but it stills wants to do the calculation for the file containing only NAs (error: 0 (non-NA) cases). Could you tell me what I should add in my function? Thanks a lot! get.max.cor <- function(station, mat){ mat[row(mat) == col(mat)] <- -Inf which( mat[station, ] == max(mat[station, ], na.rm=TRUE) ) } -- View this message in context: http://r.789695.n4.nabble.com/Prevent-calculation-when-only-NA-tp4630716.html Sent from the R help mailing list archive at Nabble.com.
On 05/21/2012 05:59 PM, jeff6868 wrote:> Hi everybody, > > I have a small question about R. > I'm doing some correlation matrices between my files. These files contains > each 4 columns of data. > These data files contains missing data too. It could happen sometimes that > in one file, one of the 4 columns contains only missing data NA. As I'm > doing correlations between the same columns of each files, I get a > correlation matrix with a column containing only NAs such like this: > > file1 file 2 file 3 > file1 1 NA 0.8 > file2 NA 1 NA > file3 0.8 NA 1 > > For file2, I have no correlation coefficient. > My function is looking for the highest correlation coefficient for each > file. But I have an error message due to this. > My question is: how can I say to the function: don't do any calculation if > you see only NAs for the file you're working on? The aim of this function is > to automatize this calculation for 300 files. > I tried by adding: na.rm=TRUE, but it stills wants to do the calculation for > the file containing only NAs (error: 0 (non-NA) cases). > Could you tell me what I should add in my function? Thanks a lot! > > get.max.cor<- function(station, mat){ > mat[row(mat) == col(mat)]<- -Inf > which( mat[station, ] == max(mat[station, ], na.rm=TRUE) ) > } > >Hi Jeff, Can you use: if(any(!is.na(mat))) { ... } Jim
Hello, Maybe the function could return a special value, such as zero. Since a column with that number doesn't exist, the code executed afterward would simply move on to the second greatest correlation. The function would then become get.max.cor <- function(station, mat){ mat[row(mat) == col(mat)] <- -Inf if(sum(is.na(mat[station, ])) == ncol(mat) - 1) 0 else which( mat[station, ] == max(mat[station, ], na.rm=TRUE) ) } df1 <- read.table(text=" file1 file2 file3 file1 1 NA 0.8 file2 NA 1 NA file3 0.8 NA 1 ", header=TRUE) get.max.cor("file2", df1) Hope this helps, Rui Barradas jeff6868 wrote> > Hi everybody, > > I have a small question about R. > I'm doing some correlation matrices between my files. These files contains > each 4 columns of data. > These data files contains missing data too. It could happen sometimes that > in one file, one of the 4 columns contains only missing data NA. As I'm > doing correlations between the same columns of each files, I get a > correlation matrix with a column containing only NAs such like this: > > file1 file 2 file 3 > file1 1 NA 0.8 > file2 NA 1 NA > file3 0.8 NA 1 > > For file2, I have no correlation coefficient. > My function is looking for the highest correlation coefficient for each > file. But I have an error message due to this. > My question is: how can I say to the function: don't do any calculation if > you see only NAs for the file you're working on? The aim of this function > is to automatize this calculation for 300 files. > I tried by adding: na.rm=TRUE, but it stills wants to do the calculation > for the file containing only NAs (error: 0 (non-NA) cases). > Could you tell me what I should add in my function? Thanks a lot! > > get.max.cor <- function(station, mat){ > mat[row(mat) == col(mat)] <- -Inf > which( mat[station, ] == max(mat[station, ], na.rm=TRUE) ) > } >-- View this message in context: http://r.789695.n4.nabble.com/Prevent-calculation-when-only-NA-tp4630716p4630728.html Sent from the R help mailing list archive at Nabble.com.
Hello Rui, Thanks for your answer too. I tried your proposition too, but by giving the value 0 for this file, it still wants to make a calculation with it. As it is looking for the best correlation, and then the 2nd best correlation, giving only 0 seems to be a problem for the 2nd best correlation at least. Maybe the best way to solve the problem would be to introduce in the function get.max.cor a line which would delete all the colums containing only NAs in my correlation matrix? For example if my calculated correlation matrix is (imagine that the numeric values are correlation coefficients for the example): x <- data.frame(a = 1:10, b = c(1:5,NA,7:9, NA), c = 21:30, d = NA) Maybe is it possible in my function to delete only columns containing 100% of NA, in order to have a matrix like this: x <- data.frame(a = 1:10, b = c(1:5,NA,7:9, NA), c = 21:30) and to keep other columns even if there're some NAs (the calculation is still possible as they're numeric coefficients in the column). Actually, it cannot look for the best or the second best correlation coefficient in a column if it contains only NA. I think that a correlation matrix like this would allow the calculation for the next function and the rest of my script. -- View this message in context: http://r.789695.n4.nabble.com/Prevent-calculation-when-only-NA-tp4630716p4630731.html Sent from the R help mailing list archive at Nabble.com.