thr3ads.net - R help - [R] Problem in Removing Correlated Columns [May 2008]

If this information is useful, please help other people find it:
Share via:

Nataraj

2008-May-16 11:45 UTC

[R] Problem in Removing Correlated Columns

Dear all,
For removing correlated columns in a data frame,df.
I found a code written in R in the page
http://cheminfo.informatics.indiana.edu/~rguha/code/R/ of
Mr.Rajarshi Guha. 
The code is 
#################
r2test <- function(df, cutoff=0.8) {
  if (cutoff > 1 || cutoff <= 0) {
    stop(" 0 <= cutoff < 1")
  }
  if (!is.matrix(d) && !is.data.frame(d)) {
    stop("Must supply a data.frame or matrix")
  }
  r2cut = sqrt(cutoff);
  cormat <- cor(d);
  bad.idx <- which(abs(cormat)>r2cut,arr.ind=T);
  bad.idx <- matrix( bad.idx[bad.idx[,1] > bad.idx[,2]],
ncol=2);
  drop.idx <- ifelse(runif(nrow(bad.idx)) > .5,
bad.idx[,1], bad.idx [,2]);
  if (length(drop.idx) == 0) {
      1:ncol(d)
  } else {
      (1:ncol(d))[-unique(drop.idx)]
  }
}
############################################
Now the problem is the code return different output (i.e.
different column number) for a different call. I could not
understood why it happens from that code, but I can
understand the logic in code except the line
********************************************
drop.idx <- ifelse(runif(nrow(bad.idx)) > .5, bad.idx[,1],
bad.idx [,2]);
****************************************
what it means by comparing > 0.5 of nrow(bad.idx).
So I am looking for anyone to help me for different output
generation between the different function call as well as
 meaning of the line which I mentioned above.

Thanks!
B.Nataraj

R help - May 2008 - Problem in Removing Correlated Columns

[R] Problem in Removing Correlated Columns