Arnaud Gaboury
2012-Feb-27 17:36 UTC
[R] compare two data frames of different dimensions and only keep unique rows
Dear list, I am still struggling with something that should be easy: I compare two data frames with a lot of common rows and want to keep only rows that are NOT in both data frames, unique. Here are an example of these data frame. reported <- structure(list(Product = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 5L, 5L), .Label = c("Cocoa", "Coffee C", "GC", "Sugar No 11", "ZS"), class = "factor"), Price = c(2331, 2356, 2440, 2450, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61L, -61L, 5L, 1L, 40L, 40L, -1L, -1L, -1L, 1L)), .Names = c("Product", "Price", "Nbr.Lots"), row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 5L, 10L, 8L, 9L), class = "data.frame") exportfile <- structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11", "ZS", "ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40, 40, -1, -1, -1, 1)), .Names = c("Product", "Price", "Nbr.Lots"), row.names = c(NA, 9L), class = "data.frame") I can rbind() them, thus resulting in one data frame with duplicated row, but I have no idea how to delete duplicated rows. I have tried plyaing with unique(), duplicated with no success v<-rbind(exportfile,reported) v <- structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11", "ZS", "ZS", "Cocoa", "Cocoa", "Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11", "ZS", "ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25, 2331, 2356, 2440, 2450, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40, 40, -1, -1, -1, 1, -61, -61, 5, 1, 40, 40, -1, -1, -1, 1)), .Names = c("Product", "Price", "Nbr.Lots"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "11", "21", "31", "41", "61", "71", "51", "10", "81", "91"), class = "data.frame") TY for your help Arnaud Gaboury ? A2CT2 Ltd.
jim holtman
2012-Feb-27 17:41 UTC
[R] compare two data frames of different dimensions and only keep unique rows
is this what you want:> v <- rbind(reported, exportfile) > v[!duplicated(v), ]Product Price Nbr.Lots 1 Cocoa 2331.00 -61 2 Cocoa 2356.00 -61 3 Cocoa 2440.00 5 4 Cocoa 2450.00 1 6 Coffee C 204.55 40 7 Coffee C 205.45 40 5 GC 17792.00 -1 10 Sugar No 11 24.81 -1 8 ZS 1273.50 -1 9 ZS 1276.25 1 13 Cocoa 2440.00 6>On Mon, Feb 27, 2012 at 12:36 PM, Arnaud Gaboury <arnaud.gaboury at a2ct2.com> wrote:> Dear list, > > I am still struggling with something that should be easy: I compare two data frames with a lot of common rows and want to keep only rows that are NOT in both data frames, unique. > > Here are an example of these data frame. > > reported <- > structure(list(Product = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 5L, 5L), .Label = c("Cocoa", "Coffee C", "GC", "Sugar No 11", "ZS"), class = "factor"), Price = c(2331, 2356, 2440, 2450, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61L, -61L, 5L, 1L, 40L, 40L, -1L, -1L, -1L, 1L)), .Names = c("Product", "Price", "Nbr.Lots"), row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 5L, 10L, 8L, 9L), class = "data.frame") > > exportfile <- > structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11", "ZS", "ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40, 40, -1, -1, -1, 1)), .Names = c("Product", "Price", "Nbr.Lots"), row.names = c(NA, 9L), class = "data.frame") > > I can rbind() them, thus resulting in one data frame with duplicated row, but I have no idea how to delete duplicated rows. I have tried plyaing with unique(), duplicated with no success > > v<-rbind(exportfile,reported) > v <- > structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C", > "Coffee C", "GC", "Sugar No 11", "ZS", "ZS", "Cocoa", "Cocoa", > "Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11", > "ZS", "ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45, 17792, > 24.81, 1273.5, 1276.25, 2331, 2356, 2440, 2450, 204.55, 205.45, > 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40, > 40, -1, -1, -1, 1, -61, -61, 5, 1, 40, 40, -1, -1, -1, 1)), .Names = c("Product", > "Price", "Nbr.Lots"), row.names = c("1", "2", "3", "4", "5", > "6", "7", "8", "9", "11", "21", "31", "41", "61", "71", "51", > "10", "81", "91"), class = "data.frame") > > > TY for your help > > Arnaud Gaboury > > A2CT2 Ltd. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
Rui Barradas
2012-Feb-27 18:43 UTC
[R] compare two data frames of different dimensions and only keep unique rows
Hello,> > In fact, the only three unique rows are these ones: > > Product Price Nbr.Lots > Cocoa 2440 5 > Cocoa 2450 1 > Cocoa 2440 6 >The code below doesn't give the three in one step. (ixmat <- apply(reported, 1, function(x) apply(exportfile, 1, function(y) any(x != y)))) reported[apply(ixmat, 2, all), ] exportfile[apply(ixmat, 1, all), ] But it does give all three. Problem: it's inefficient. If n1 <- nrow(reported) and n2 <- nrow(exportfile), to form the index matrix alone it will make ncol times n1*n2 comparisons. And the matrix dims are n1, n2 ... Anyway, I hope it gives ideas, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/compare-two-data-frames-of-different-dimensions-and-only-keep-unique-rows-tp4425379p4425605.html Sent from the R help mailing list archive at Nabble.com.