Hello, I have two dataframes DF1 and DF2 that should be identical but are not (DF1 has some rows that aren't in DF2, and vice versa). I would like to produce a new dataframe DF3 containing rows in DF1 that aren't in DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1). I have a solution for this problem (see self contained example below) but it's awkward and requires making a new "ID" column by pasting together all of the columns in each DF and them comparing the two DFs based on this unique ID. Is there a better way? Many thanks for your help, Mark #compare two dataframes and extract uncommon rows #MAKE SOME DATA cars$id<-paste(cars$speed,cars$dist,sep="") $create unique ID field by pasting all columns together cars1<-cars[1:35,] cars2<-cars[16:50,] #EXTRACT UNIQUE ROWS cars1_unique<-cars1[cars1$id %in% setdiff(cars1$id,cars2$id),] #rows unique to cars1 (i.e., not in cars2) cars2_unique<-cars2[cars2$id %in% setdiff(cars2$id,cars1$id),] #rows unique to cars2
Hello Mark, This is how I do it but it's longer than your code :) unique.rows <- function (df1, df2) { # Returns any rows of df1 that are not in df2 out <- NULL for (i in 1:nrow(df1)) { found <- FALSE for (j in 1:nrow(df2)) { if (all(df1[i,] == df2[j,])) { found <- TRUE break } } if (!found) out <- rbind(out, df1[i,]) } out } Michael On 17 December 2010 06:02, Mark Na <mtb954 at gmail.com> wrote:> Hello, > > I have two dataframes DF1 and DF2 that should be identical but are not > (DF1 has some rows that aren't in DF2, and vice versa). I would like > to produce a new dataframe DF3 containing rows in DF1 that aren't in > DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1). > > I have a solution for this problem (see self contained example below) > but it's awkward and requires making a new "ID" column by pasting > together all of the columns in each DF and them comparing the two DFs > based on this unique ID. > > Is there a better way? > > Many thanks for your help, > > Mark > > > > #compare two dataframes and extract uncommon rows > > #MAKE SOME DATA > cars$id<-paste(cars$speed,cars$dist,sep="") $create unique ID field by > pasting all columns together > cars1<-cars[1:35,] > cars2<-cars[16:50,] > > #EXTRACT UNIQUE ROWS > cars1_unique<-cars1[cars1$id %in% setdiff(cars1$id,cars2$id),] #rows > unique to cars1 (i.e., not in cars2) > cars2_unique<-cars2[cars2$id %in% setdiff(cars2$id,cars1$id),] #rows > unique to cars2 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Thu, Dec 16, 2010 at 01:02:29PM -0600, Mark Na wrote:> Hello, > > I have two dataframes DF1 and DF2 that should be identical but are not > (DF1 has some rows that aren't in DF2, and vice versa). I would like > to produce a new dataframe DF3 containing rows in DF1 that aren't in > DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1).The function unique(DF) removes duplicated rows of DF and keeps the unique rows in the order of their first occurrence. So, if DF1 does not contain duplicated rows, then unique(rbind(DF1, DF2)) contains first DF1 and then the rows, which are unique to DF2, if there are any. The order of the rows in the result depends on the order of the original data frames and if DF2 contains several instances of a row, which is not in DF1, we get only the first instance of this row in the difference. #MAKE SOME DATA cars$id <- paste(cars$speed, cars$dist, sep="") #create unique ID field by pasting all columns together cars1 <- cars[1:35, ] cars2 <- cars[16:50, ] #EXTRACT UNIQUE ROWS cars1_unique <- cars1[cars1$id %in% setdiff(cars1$id, cars2$id), ] #rows unique to cars1 (i.e., not in cars2) cars2_unique <- cars2[cars2$id %in% setdiff(cars2$id, cars1$id), ] #rows unique to cars2 cars1_set <- unique(cars1) cars2_set <- unique(cars2) cars1_plus <- unique(rbind(cars1_set, cars2_set)) cars2_plus <- unique(rbind(cars2_set, cars1_set)) cars1_diff <- cars2_plus[ - seq(nrow(cars2_set)), ] cars2_diff <- cars1_plus[ - seq(nrow(cars1_set)), ] all(cars1_unique == cars1_diff) # [1] TRUE all(cars2_unique == cars2_diff) # [1] TRUE Petr Savicky.