thr3ads.net - R devel - [Rd] setdiff for data frames [Dec 2007]

If this information is useful, please help other people find it:
Share via:

G. Jay Kerns

2007-Dec-10 15:53 UTC

[Rd] setdiff for data frames

Hello,

I have been interested in setdiff() for data frames that operates
row-wise.  I looked in the documentation, mailing lists, etc., and
didn't find exactly the right thing.  Given data frames A, B with the
same columns, the goal is to extract the rows that are in A, but not
in B.  Of course, one can usually do setdiff(rownames(A), rownames(B))
but that is cheating.  :-)

I played around a little bit and came up with

setdiff.data.frame = function(A, B){
     g <-  function( y, B){
                 any( apply(B, 1, FUN = function(x)
identical(all.equal(x, y), TRUE) ) ) }
     unique( A[ !apply(A, 1, FUN = function(t) g(t, B) ), ] )
}

I am sure that somebody can do this a better/faster way... any ideas?
Any chance we could get a data.frame method for set.diff in future R
versions? (The notion of "set" is somewhat ambiguous with respect to
rows, columns, and entries in the data frame case.)


Jay


P.S. You can see what I'm looking for with

A <- expand.grid( 1:3, 1:3 )
B <- A[ 2:5, ]
setdiff.data.frame(A,B)





***************************************************
G. Jay Kerns, Ph.D.
Assistant Professor / Statistics Coordinator
Department of Mathematics & Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gkerns at ysu.edu
http://www.cc.ysu.edu/~gjkerns/

Charles C. Berry

2007-Dec-11 01:58 UTC

head link

[Rd] setdiff for data frames

On Mon, 10 Dec 2007, G. Jay Kerns wrote:
> Hello,
>
> I have been interested in setdiff() for data frames that operates
> row-wise.  I looked in the documentation, mailing lists, etc., and
> didn't find exactly the right thing.  Given data frames A, B with the
> same columns, the goal is to extract the rows that are in A, but not
> in B.  Of course, one can usually do setdiff(rownames(A), rownames(B))
> but that is cheating.  :-)
>
> I played around a little bit and came up with
>
> setdiff.data.frame = function(A, B){
>     g <-  function( y, B){
>                 any( apply(B, 1, FUN = function(x)
> identical(all.equal(x, y), TRUE) ) ) }
>     unique( A[ !apply(A, 1, FUN = function(t) g(t, B) ), ] )
> }
>
> I am sure that somebody can do this a better/faster way... any ideas?
setdiff.data.frame <-
    function(A,B) A[ !duplicated( rbind(B,A) )[ -seq_len(nrow(B))] , ]

This ignores rownames(A) which may not be what is wanted in every case.

HTH,

Chuck
> Any chance we could get a data.frame method for set.diff in future R
> versions? (The notion of "set" is somewhat ambiguous with respect
to
> rows, columns, and entries in the data frame case.)
>
>
> Jay
>
>
> P.S. You can see what I'm looking for with
>
> A <- expand.grid( 1:3, 1:3 )
> B <- A[ 2:5, ]
> setdiff.data.frame(A,B)
>
>
>
>
>
> ***************************************************
> G. Jay Kerns, Ph.D.
> Assistant Professor / Statistics Coordinator
> Department of Mathematics & Statistics
> Youngstown State University
> Youngstown, OH 44555-0002 USA
> Office: 1035 Cushwa Hall
> Phone: (330) 941-3310 Office (voice mail)
> -3302 Department
> -3170 FAX
> E-mail: gkerns at ysu.edu
> http://www.cc.ysu.edu/~gjkerns/
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Apparently Analagous Threads

Search for more seemingly similar threads

R devel - Dec 2007 - setdiff for data frames

[Rd] setdiff for data frames

[Rd] setdiff for data frames

Apparently Analagous Threads