On Mon, 10 Dec 2007, G. Jay Kerns wrote:
> Hello,
>
> I have been interested in setdiff() for data frames that operates
> row-wise. I looked in the documentation, mailing lists, etc., and
> didn't find exactly the right thing. Given data frames A, B with the
> same columns, the goal is to extract the rows that are in A, but not
> in B. Of course, one can usually do setdiff(rownames(A), rownames(B))
> but that is cheating. :-)
>
> I played around a little bit and came up with
>
> setdiff.data.frame = function(A, B){
> g <- function( y, B){
> any( apply(B, 1, FUN = function(x)
> identical(all.equal(x, y), TRUE) ) ) }
> unique( A[ !apply(A, 1, FUN = function(t) g(t, B) ), ] )
> }
>
> I am sure that somebody can do this a better/faster way... any ideas?
setdiff.data.frame <-
function(A,B) A[ !duplicated( rbind(B,A) )[ -seq_len(nrow(B))] , ]
This ignores rownames(A) which may not be what is wanted in every case.
HTH,
Chuck
> Any chance we could get a data.frame method for set.diff in future R
> versions? (The notion of "set" is somewhat ambiguous with respect
to
> rows, columns, and entries in the data frame case.)
>
>
> Jay
>
>
> P.S. You can see what I'm looking for with
>
> A <- expand.grid( 1:3, 1:3 )
> B <- A[ 2:5, ]
> setdiff.data.frame(A,B)
>
>
>
>
>
> ***************************************************
> G. Jay Kerns, Ph.D.
> Assistant Professor / Statistics Coordinator
> Department of Mathematics & Statistics
> Youngstown State University
> Youngstown, OH 44555-0002 USA
> Office: 1035 Cushwa Hall
> Phone: (330) 941-3310 Office (voice mail)
> -3302 Department
> -3170 FAX
> E-mail: gkerns at ysu.edu
> http://www.cc.ysu.edu/~gjkerns/
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901