jeff6868
2012-May-31 08:36 UTC
[R] ignore NA column in a DF (for calculation) without removing them
Dear users, I have for the moment a function which looks for the best correlation for each file I have in my correlation matrix. I'm working on a list.files. Here's the function: get.max.cor <- function(station, mat){ mat[row(mat) == col(mat)] <- -Inf which( mat[station, ] == max(mat[station, ],na.rm=TRUE) ) } If I have a correlation matrix like this (no NA-value): cor1 <- read.table(text=" ST208 ST209 ST210 ST211 ST212 ST208 1.0000000 0.8646358 0.8104837 0.8899451 0.7486417 ST209 0.8646358 1.0000000 0.9335584 0.8392696 0.8676857 ST210 0.8104837 0.9335584 1.0000000 0.8304132 0.9141465 ST211 0.8899451 0.8392696 0.8304132 1.0000000 0.8064669 ST212 0.7486417 0.8676857 0.9141465 0.8064669 1.0000000 ", header=TRUE) It works perfectly. If I have a correlation matrix with some NAs (but not only NAs) like this: cor2 <- read.table(text=" ST208 ST209 ST210 ST211 ST212 ST208 1.0000000 NA 0.9666491 0.9573701 0.9233598 ST209 NA 1.0000000 0.9744054 0.9577192 0.9346706 ST210 0.9666491 0.9744054 1.0000000 0.9460145 0.9582683 ST211 0.9573701 0.9577192 0.9460145 1.0000000 NA ST212 0.9233598 0.9346706 0.9582683 NA 1.0000000 ", header=TRUE) It still works thanks to na.rm=TRUE, but when I have one file with no data, and so only NAs in the column like this: cor3 <- read.table(text=" ST208 ST209 ST210 ST211 ST212 ST208 1.0000000 NA 0.8104837 0.8899451 0.7486417 ST209 NA NA NA NA NA ST210 0.8104837 NA 1.0000000 0.8304132 0.9141465 ST211 0.8899451 NA 0.8304132 1.0000000 0.8064669 ST212 0.7486417 NA 0.9141465 0.8064669 1.0000000 ", header=TRUE) It doesn't work of course, because there's no non-NA value and so, no max correlation for this file. That's why I have this error: 0 (non-na) cases. I tried to remove the NA columns, but as I'm working on a list.files, the number of files in the list and in the matrix will be not the same. I searched on the web but I only found some topics about removing NA columns. In my case, I would like to ignore these NA columns without removing them. I would like to say to R: when you are looking for the highest correlation for each file in the correlation matrix, if you see a file with no correlation coeff (only NAs column), don't do anything with it, keep it like this and go to the next file (next column or row). I also tried to put else {NA} or else {NULL} to avoid this problem but it still doesn't work. Does somebody have an idea how to solve this problem? Thank you very much. Best regards Geoffrey -- View this message in context: http://r.789695.n4.nabble.com/ignore-NA-column-in-a-DF-for-calculation-without-removing-them-tp4631912.html Sent from the R help mailing list archive at Nabble.com.