thr3ads.net - R help - [R] Comparing entire row sets at once efficiently [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Dirk Eddelbuettel

2006-Sep-28 15:54 UTC

[R] Comparing entire row sets at once efficiently

Dear useRs,

I am having a hard time coming up with a nice and efficient solution to
a problem on entires matrices or data.frames. In spirit, this is similar to
what setdiff() and setequal() do, but I need it in more dimensions.

Here's a brief description.

  * given a set of factors or sequences, expand.grid() gives me the set
    of permutations in a data.frame; 

    in my case all arguments are numeric so I could convert the data frame to
    a matrix

    let's call this one Candidates

  * I have a second matrix (or data frame) to compare to; this second 
    set may be a subset of the first, or a superset but it guaranted to
    contain the same columns

    let's call this one Comparison

  * I want know which rows in Candidates are not yet in Comparison.

A toy example:
> Comparison <- matrix(1:30, ncol=5)
> Candidates <- Comparison[c(2,4), ]
> checkRow <- function(r, M) { any( (r[1] == M[,1]) & (r[2] == M[,2])
& (r[3] == M[,3]) & (r[4] == M[,4]) ) }
> checkRow( Candidates[1,], Comparison)
[1] TRUE> falseRow <- Candidates[1,] 
> falseRow[2] <- 42
> checkRow( falseRow, Comparison)
[1] FALSE> 
The checkRow function works but is a) klunky, b) hardcodes the dimension and
c) works only on one row at a time.

There must be better ways, at least for a) and b).  What am I missing?  

Feel free to reply off-list and I'd gladly summarize back to the list. If
you
don't want your reply (or email) summarized back, please indicate.

Thanks, Dirk



-- 
Hell, there are no rules here - we're trying to accomplish something. 
                                                  -- Thomas A. Edison

Gabor Grothendieck

2006-Sep-28 16:05 UTC

head link

[R] Comparing entire row sets at once efficiently

If Comparison and Candidates each have no duplicated rows (which
is the situation in the example) then try this:

tail(!duplicated(rbind(Comparison, Candidates)), nrow(Candidates))


On 9/28/06, Dirk Eddelbuettel <edd at debian.org>
wrote:>
> Dear useRs,
>
> I am having a hard time coming up with a nice and efficient solution to
> a problem on entires matrices or data.frames. In spirit, this is similar to
> what setdiff() and setequal() do, but I need it in more dimensions.
>
> Here's a brief description.
>
>  * given a set of factors or sequences, expand.grid() gives me the set
>    of permutations in a data.frame;
>
>    in my case all arguments are numeric so I could convert the data frame
to
>    a matrix
>
>    let's call this one Candidates
>
>  * I have a second matrix (or data frame) to compare to; this second
>    set may be a subset of the first, or a superset but it guaranted to
>    contain the same columns
>
>    let's call this one Comparison
>
>  * I want know which rows in Candidates are not yet in Comparison.
>
> A toy example:
>
> > Comparison <- matrix(1:30, ncol=5)
> > Candidates <- Comparison[c(2,4), ]
> > checkRow <- function(r, M) { any( (r[1] == M[,1]) & (r[2] ==
M[,2]) & (r[3] == M[,3]) & (r[4] == M[,4]) ) }
> > checkRow( Candidates[1,], Comparison)
> [1] TRUE
> > falseRow <- Candidates[1,]
> > falseRow[2] <- 42
> > checkRow( falseRow, Comparison)
> [1] FALSE
> >
>
> The checkRow function works but is a) klunky, b) hardcodes the dimension
and
> c) works only on one row at a time.
>
> There must be better ways, at least for a) and b).  What am I missing?
>
> Feel free to reply off-list and I'd gladly summarize back to the list.
If you
> don't want your reply (or email) summarized back, please indicate.
>
> Thanks, Dirk
>
>
>
> --
> Hell, there are no rules here - we're trying to accomplish something.
>                                                  -- Thomas A. Edison
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Sep 2006 - Comparing entire row sets at once efficiently

[R] Comparing entire row sets at once efficiently

[R] Comparing entire row sets at once efficiently

Apparently Analagous Threads