thr3ads.net - R help - [R] Compare two dataframes [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Mark Na

2010-Dec-16 19:02 UTC

[R] Compare two dataframes

Hello,

I have two dataframes DF1 and DF2 that should be identical but are not
(DF1 has some rows that aren't in DF2, and vice versa). I would like
to produce a new dataframe DF3 containing rows in DF1 that aren't in
DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1).

I have a solution for this problem (see self contained example below)
but it's awkward and requires making a new "ID" column by pasting
together all of the columns in each DF and them comparing the two DFs
based on this unique ID.

Is there a better way?

Many thanks for your help,

Mark



#compare two dataframes and extract uncommon rows

#MAKE SOME DATA
cars$id<-paste(cars$speed,cars$dist,sep="") $create unique ID field
by
pasting all columns together
cars1<-cars[1:35,]
cars2<-cars[16:50,]

#EXTRACT UNIQUE ROWS
cars1_unique<-cars1[cars1$id %in% setdiff(cars1$id,cars2$id),] #rows
unique to cars1 (i.e., not in cars2)
cars2_unique<-cars2[cars2$id %in% setdiff(cars2$id,cars1$id),] #rows
unique to cars2

Michael Bedward

2010-Dec-16 23:56 UTC

head link

[R] Compare two dataframes

Hello Mark,

This is how I do it but it's longer than your code :)

unique.rows <- function (df1, df2) {
# Returns any rows of df1 that are not in df2
  out <- NULL
  for (i in 1:nrow(df1)) {
  	found <- FALSE
    for (j in 1:nrow(df2)) {
      if (all(df1[i,] == df2[j,])) {
      	found <- TRUE
        break
      }
    }
    if (!found) out <- rbind(out, df1[i,])
  }
  out
}

Michael

On 17 December 2010 06:02, Mark Na <mtb954 at gmail.com>
wrote:> Hello,
>
> I have two dataframes DF1 and DF2 that should be identical but are not
> (DF1 has some rows that aren't in DF2, and vice versa). I would like
> to produce a new dataframe DF3 containing rows in DF1 that aren't in
> DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1).
>
> I have a solution for this problem (see self contained example below)
> but it's awkward and requires making a new "ID" column by
pasting
> together all of the columns in each DF and them comparing the two DFs
> based on this unique ID.
>
> Is there a better way?
>
> Many thanks for your help,
>
> Mark
>
>
>
> #compare two dataframes and extract uncommon rows
>
> #MAKE SOME DATA
> cars$id<-paste(cars$speed,cars$dist,sep="") $create unique ID
field by
> pasting all columns together
> cars1<-cars[1:35,]
> cars2<-cars[16:50,]
>
> #EXTRACT UNIQUE ROWS
> cars1_unique<-cars1[cars1$id %in% setdiff(cars1$id,cars2$id),] #rows
> unique to cars1 (i.e., not in cars2)
> cars2_unique<-cars2[cars2$id %in% setdiff(cars2$id,cars1$id),] #rows
> unique to cars2
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Petr Savicky

2010-Dec-17 08:27 UTC

head link

[R] Compare two dataframes

On Thu, Dec 16, 2010 at 01:02:29PM -0600, Mark Na wrote:> Hello,
> 
> I have two dataframes DF1 and DF2 that should be identical but are not
> (DF1 has some rows that aren't in DF2, and vice versa). I would like
> to produce a new dataframe DF3 containing rows in DF1 that aren't in
> DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1).
The function unique(DF) removes duplicated rows of DF and keeps the unique
rows in the order of their first occurrence. So, if DF1 does not contain
duplicated rows, then unique(rbind(DF1, DF2)) contains first DF1 and
then the rows, which are unique to DF2, if there are any. The order of
the rows in the result depends on the order of the original data frames
and if DF2 contains several instances of a row, which is not in DF1, we
get only the first instance of this row in the difference.

  #MAKE SOME DATA
  cars$id <- paste(cars$speed, cars$dist, sep="") #create unique ID
field by pasting all columns together
  cars1 <- cars[1:35, ]
  cars2 <- cars[16:50, ]
 
  #EXTRACT UNIQUE ROWS
  cars1_unique <- cars1[cars1$id %in% setdiff(cars1$id, cars2$id), ] #rows
unique to cars1 (i.e., not in cars2)
  cars2_unique <- cars2[cars2$id %in% setdiff(cars2$id, cars1$id), ] #rows
unique to cars2
 
  cars1_set <- unique(cars1)
  cars2_set <- unique(cars2)
 
  cars1_plus <- unique(rbind(cars1_set, cars2_set))
  cars2_plus <- unique(rbind(cars2_set, cars1_set))
 
  cars1_diff <- cars2_plus[ - seq(nrow(cars2_set)), ]
  cars2_diff <- cars1_plus[ - seq(nrow(cars1_set)), ]

  all(cars1_unique == cars1_diff) # [1] TRUE
  all(cars2_unique == cars2_diff) # [1] TRUE

Petr Savicky.

R help - Dec 2010 - Compare two dataframes

[R] Compare two dataframes

[R] Compare two dataframes

[R] Compare two dataframes

Seemingly Similar Threads