thr3ads.net - R help - [R] any way to make it work faster (deleting rows that contain certain values) [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Dimitri Liakhovitski

2009-Sep-22 18:07 UTC

[R] any way to make it work faster (deleting rows that contain certain values)

Hello, dear R'ers,

index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4)

In this case, dim(index) is 7,340,032 (!)  and 11.
I realize it's huge.
Then, I am trying to get rid of the undesired combinations of columns.
They should not contain identical values in any 2 columns.
Also if column 1 has a value of 5, there should be no 2 in any other column,
if column 1 has a value of 6, there should be no 3 in any other column, and
column 1 has a value of 7, there should be no 4 in any other column.
I worte a generic script to achieve that (below).
However, I was wondering if it's possible to make it any faster - it
looks like with that huge index it's going to take me days...

Thanks a lot for any suggestion!
Dimitri

index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4)
bad.pairs<-matrix(c(1,1,2,2,3,3,4,4,5,2,6,3,7,4),nrow=7,ncol=2,byrow=T)
for(i in 1:ncol(index)){                # looping through columns of the
"index"
  for(pair in 1:nrow(bad.pairs)){     # looping through rows of
"bad.pairs"
    keep<-sapply(1:nrow(index), function(x){
      temp<-(index[[x,i]]==bad.pairs[pair,1]) &
(any(index[x,-i]==bad.pairs[pair,2]))
      return(temp)
    })
    index<-index[!keep,]
  }
}

-- 
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com

Charles C. Berry

2009-Sep-22 21:36 UTC

head link

[R] any way to make it work faster (deleting rows that contain certain values)

On Tue, 22 Sep 2009, Dimitri Liakhovitski wrote:
> Hello, dear R'ers,
>
> index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4)
>
> In this case, dim(index) is 7,340,032 (!)  and 11.
> I realize it's huge.
> Then, I am trying to get rid of the undesired combinations of columns.
> They should not contain identical values in any 2 columns.

Right, but you have only four values in each of columns 2:11.

And none of them can be identical.

There are exactly

 	choose(4,10)

rows that satisfy that constraint for columns 2:11.

The rows of your result are easily enumerated by hand. ;-)

HTH,

Chuck
> Also if column 1 has a value of 5, there should be no 2 in any other
column,
> if column 1 has a value of 6, there should be no 3 in any other column, and
> column 1 has a value of 7, there should be no 4 in any other column.
> I worte a generic script to achieve that (below).
> However, I was wondering if it's possible to make it any faster - it
> looks like with that huge index it's going to take me days...
>
> Thanks a lot for any suggestion!
> Dimitri
>
> index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4)
> bad.pairs<-matrix(c(1,1,2,2,3,3,4,4,5,2,6,3,7,4),nrow=7,ncol=2,byrow=T)
> for(i in 1:ncol(index)){                # looping through columns of the
"index"
>  for(pair in 1:nrow(bad.pairs)){     # looping through rows of
"bad.pairs"
>    keep<-sapply(1:nrow(index), function(x){
>      temp<-(index[[x,i]]==bad.pairs[pair,1]) &
> (any(index[x,-i]==bad.pairs[pair,2]))
>      return(temp)
>    })
>    index<-index[!keep,]
>  }
> }
>
> -- 
> Dimitri Liakhovitski
> Ninah.com
> Dimitri.Liakhovitski at ninah.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Reasonably Related Threads

Search for more possibly parallel threads

R help - Sep 2009 - any way to make it work faster (deleting rows that contain certain values)

[R] any way to make it work faster (deleting rows that contain certain values)

[R] any way to make it work faster (deleting rows that contain certain values)

Reasonably Related Threads