thr3ads.net - R help - [R] compare two data frames of different dimensions and only keep unique rows [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Arnaud Gaboury

2012-Feb-27 17:36 UTC

[R] compare two data frames of different dimensions and only keep unique rows

Dear list,

I am still struggling with something that should be easy: I compare two data
frames with a lot of common rows and want to keep only rows that are NOT in both
data frames, unique.

Here are an example of these data frame.

reported <-
structure(list(Product = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 5L, 5L),
.Label = c("Cocoa", "Coffee C", "GC", "Sugar
No 11", "ZS"), class = "factor"), Price = c(2331, 2356,
2440, 2450, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61L,
-61L, 5L, 1L, 40L, 40L, -1L, -1L, -1L, 1L)), .Names = c("Product",
"Price", "Nbr.Lots"), row.names = c(1L, 2L, 3L, 4L, 6L, 7L,
5L, 10L, 8L, 9L), class = "data.frame")

exportfile <-
structure(list(Product = c("Cocoa", "Cocoa",
"Cocoa", "Coffee C", "Coffee C", "GC",
"Sugar No 11", "ZS", "ZS"), Price = c(2331, 2356,
2440, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6,
40, 40, -1, -1, -1, 1)), .Names = c("Product", "Price",
"Nbr.Lots"), row.names = c(NA, 9L), class = "data.frame")

I can rbind() them, thus resulting in one data frame with duplicated row, but I
have no idea how to delete duplicated rows. I have tried plyaing with unique(),
duplicated with no success

v<-rbind(exportfile,reported)
v <-
structure(list(Product = c("Cocoa", "Cocoa",
"Cocoa", "Coffee C",
"Coffee C", "GC", "Sugar No 11", "ZS",
"ZS", "Cocoa", "Cocoa",
"Cocoa", "Cocoa", "Coffee C", "Coffee
C", "GC", "Sugar No 11",
"ZS", "ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45,
17792,
24.81, 1273.5, 1276.25, 2331, 2356, 2440, 2450, 204.55, 205.45, 
17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40, 
40, -1, -1, -1, 1, -61, -61, 5, 1, 40, 40, -1, -1, -1, 1)), .Names =
c("Product",
"Price", "Nbr.Lots"), row.names = c("1",
"2", "3", "4", "5",
"6", "7", "8", "9", "11",
"21", "31", "41", "61", "71",
"51",
"10", "81", "91"), class = "data.frame")


TY for your help

Arnaud Gaboury
?
A2CT2 Ltd.

jim holtman

2012-Feb-27 17:41 UTC

head link

[R] compare two data frames of different dimensions and only keep unique rows

is this what you want:
> v <- rbind(reported, exportfile)
> v[!duplicated(v), ]       Product    Price Nbr.Lots
1        Cocoa  2331.00      -61
2        Cocoa  2356.00      -61
3        Cocoa  2440.00        5
4        Cocoa  2450.00        1
6     Coffee C   204.55       40
7     Coffee C   205.45       40
5           GC 17792.00       -1
10 Sugar No 11    24.81       -1
8           ZS  1273.50       -1
9           ZS  1276.25        1
13       Cocoa  2440.00        6>

On Mon, Feb 27, 2012 at 12:36 PM, Arnaud Gaboury
<arnaud.gaboury at a2ct2.com> wrote:> Dear list,
>
> I am still struggling with something that should be easy: I compare two
data frames with a lot of common rows and want to keep only rows that are NOT in
both data frames, unique.
>
> Here are an example of these data frame.
>
> reported <-
> structure(list(Product = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 5L,
5L), .Label = c("Cocoa", "Coffee C", "GC",
"Sugar No 11", "ZS"), class = "factor"), Price =
c(2331, 2356, 2440, 2450, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25),
Nbr.Lots = c(-61L, -61L, 5L, 1L, 40L, 40L, -1L, -1L, -1L, 1L)), .Names =
c("Product", "Price", "Nbr.Lots"), row.names =
c(1L, 2L, 3L, 4L, 6L, 7L, 5L, 10L, 8L, 9L), class = "data.frame")
>
> exportfile <-
> structure(list(Product = c("Cocoa", "Cocoa",
"Cocoa", "Coffee C", "Coffee C", "GC",
"Sugar No 11", "ZS", "ZS"), Price = c(2331, 2356,
2440, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6,
40, 40, -1, -1, -1, 1)), .Names = c("Product", "Price",
"Nbr.Lots"), row.names = c(NA, 9L), class = "data.frame")
>
> I can rbind() them, thus resulting in one data frame with duplicated row,
but I have no idea how to delete duplicated rows. I have tried plyaing with
unique(), duplicated with no success
>
> v<-rbind(exportfile,reported)
> v <-
> structure(list(Product = c("Cocoa", "Cocoa",
"Cocoa", "Coffee C",
> "Coffee C", "GC", "Sugar No 11",
"ZS", "ZS", "Cocoa", "Cocoa",
> "Cocoa", "Cocoa", "Coffee C", "Coffee
C", "GC", "Sugar No 11",
> "ZS", "ZS"), Price = c(2331, 2356, 2440, 204.55,
205.45, 17792,
> 24.81, 1273.5, 1276.25, 2331, 2356, 2440, 2450, 204.55, 205.45,
> 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40,
> 40, -1, -1, -1, 1, -61, -61, 5, 1, 40, 40, -1, -1, -1, 1)), .Names =
c("Product",
> "Price", "Nbr.Lots"), row.names = c("1",
"2", "3", "4", "5",
> "6", "7", "8", "9", "11",
"21", "31", "41", "61", "71",
"51",
> "10", "81", "91"), class =
"data.frame")
>
>
> TY for your help
>
> Arnaud Gaboury
>
> A2CT2 Ltd.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Rui Barradas

2012-Feb-27 18:43 UTC

head link

[R] compare two data frames of different dimensions and only keep unique rows

Hello,
> 
> In fact, the only three unique rows are these ones:
> 
>  Product Price Nbr.Lots
>    Cocoa  2440        5
>    Cocoa  2450        1
>    Cocoa  2440        6
> 
The code below doesn't give the three in one step.

(ixmat <- apply(reported, 1, function(x) apply(exportfile, 1, function(y)
any(x != y))))
reported[apply(ixmat, 2, all), ]
exportfile[apply(ixmat, 1, all), ]

But it does give all three.

Problem: it's inefficient. If n1 <- nrow(reported) and  n2 <-
nrow(exportfile),
to form the index matrix alone it will make ncol times n1*n2 comparisons.
And the matrix dims are n1, n2 ...

Anyway, I hope it gives ideas,

Rui Barradas




--
View this message in context:
http://r.789695.n4.nabble.com/compare-two-data-frames-of-different-dimensions-and-only-keep-unique-rows-tp4425379p4425605.html
Sent from the R help mailing list archive at Nabble.com.

Apparently Analagous Threads

Search for more maybe matching threads

R help - Feb 2012 - compare two data frames of different dimensions and only keep unique rows

[R] compare two data frames of different dimensions and only keep unique rows

[R] compare two data frames of different dimensions and only keep unique rows

[R] compare two data frames of different dimensions and only keep unique rows

Apparently Analagous Threads