Dimitri Liakhovitski
2009-Sep-22 18:07 UTC
[R] any way to make it work faster (deleting rows that contain certain values)
Hello, dear R'ers, index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4) In this case, dim(index) is 7,340,032 (!) and 11. I realize it's huge. Then, I am trying to get rid of the undesired combinations of columns. They should not contain identical values in any 2 columns. Also if column 1 has a value of 5, there should be no 2 in any other column, if column 1 has a value of 6, there should be no 3 in any other column, and column 1 has a value of 7, there should be no 4 in any other column. I worte a generic script to achieve that (below). However, I was wondering if it's possible to make it any faster - it looks like with that huge index it's going to take me days... Thanks a lot for any suggestion! Dimitri index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4) bad.pairs<-matrix(c(1,1,2,2,3,3,4,4,5,2,6,3,7,4),nrow=7,ncol=2,byrow=T) for(i in 1:ncol(index)){ # looping through columns of the "index" for(pair in 1:nrow(bad.pairs)){ # looping through rows of "bad.pairs" keep<-sapply(1:nrow(index), function(x){ temp<-(index[[x,i]]==bad.pairs[pair,1]) & (any(index[x,-i]==bad.pairs[pair,2])) return(temp) }) index<-index[!keep,] } } -- Dimitri Liakhovitski Ninah.com Dimitri.Liakhovitski at ninah.com
Charles C. Berry
2009-Sep-22 21:36 UTC
[R] any way to make it work faster (deleting rows that contain certain values)
On Tue, 22 Sep 2009, Dimitri Liakhovitski wrote:> Hello, dear R'ers, > > index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4) > > In this case, dim(index) is 7,340,032 (!) and 11. > I realize it's huge. > Then, I am trying to get rid of the undesired combinations of columns. > They should not contain identical values in any 2 columns.Right, but you have only four values in each of columns 2:11. And none of them can be identical. There are exactly choose(4,10) rows that satisfy that constraint for columns 2:11. The rows of your result are easily enumerated by hand. ;-) HTH, Chuck> Also if column 1 has a value of 5, there should be no 2 in any other column, > if column 1 has a value of 6, there should be no 3 in any other column, and > column 1 has a value of 7, there should be no 4 in any other column. > I worte a generic script to achieve that (below). > However, I was wondering if it's possible to make it any faster - it > looks like with that huge index it's going to take me days... > > Thanks a lot for any suggestion! > Dimitri > > index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4) > bad.pairs<-matrix(c(1,1,2,2,3,3,4,4,5,2,6,3,7,4),nrow=7,ncol=2,byrow=T) > for(i in 1:ncol(index)){ # looping through columns of the "index" > for(pair in 1:nrow(bad.pairs)){ # looping through rows of "bad.pairs" > keep<-sapply(1:nrow(index), function(x){ > temp<-(index[[x,i]]==bad.pairs[pair,1]) & > (any(index[x,-i]==bad.pairs[pair,2])) > return(temp) > }) > index<-index[!keep,] > } > } > > -- > Dimitri Liakhovitski > Ninah.com > Dimitri.Liakhovitski at ninah.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901