I would like to know which rows are duplicates of each other, not simply that a row is duplicate of another row. In the following example rows 1 and 3 are duplicates. > x <- c(1,3,1) > y <- c(2,4,2) > z <- c(3,4,3) > data <- data.frame(x,y,z) x y z 1 1 2 3 2 3 4 4 3 1 2 3 I can't figure out how to get R to tell me that observation 1 and 3 are the same. It seems like the "duplicated" and "unique" functions should be able to help me out, but I am stumped. For instance, if I use "duplicated" ... > duplicated(data) [1] FALSE FALSE TRUE it tells me that row 3 is a duplicate, but not which row it matches. How do I figure out WHICH row it matches? And If I use "unique"... > unique(data) x y z 1 1 2 3 2 3 4 4 I see that rows 1 and 2 are unique, leaving me to infer that row 3 was a duplicate, but again it doesn't tell me which row it was a duplicate of (as far as I can tell). Am I missing something? How can I determine that row 3 is a duplicate OF ROW 1? Thanks, Aaron
If you sort the data then the duplicated entries will occur in consecutive blocks:> mx y z 1 1 2 3 2 3 4 4 3 1 2 3> m1 <- m[do.call(order, m), ] > m1x y z 1 1 2 3 3 1 2 3 2 3 4 4> duplicated(m1)[1] FALSE TRUE FALSE>When you identify the blocks, the row names will tell you where they occur in the original data frame. Bill Venables http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Aaron M. Swoboda Sent: Monday, 30 March 2009 2:07 PM To: r-help at r-project.org Subject: [R] which rows are duplicates? I would like to know which rows are duplicates of each other, not simply that a row is duplicate of another row. In the following example rows 1 and 3 are duplicates. > x <- c(1,3,1) > y <- c(2,4,2) > z <- c(3,4,3) > data <- data.frame(x,y,z) x y z 1 1 2 3 2 3 4 4 3 1 2 3 I can't figure out how to get R to tell me that observation 1 and 3 are the same. It seems like the "duplicated" and "unique" functions should be able to help me out, but I am stumped. For instance, if I use "duplicated" ... > duplicated(data) [1] FALSE FALSE TRUE it tells me that row 3 is a duplicate, but not which row it matches. How do I figure out WHICH row it matches? And If I use "unique"... > unique(data) x y z 1 1 2 3 2 3 4 4 I see that rows 1 and 2 are unique, leaving me to infer that row 3 was a duplicate, but again it doesn't tell me which row it was a duplicate of (as far as I can tell). Am I missing something? How can I determine that row 3 is a duplicate OF ROW 1? Thanks, Aaron ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
At 05:07 30/03/2009, Aaron M. Swoboda wrote:>I would like to know which rows are duplicates of each other, not >simply that a row is duplicate of another row. In the following >example rows 1 and 3 are duplicates. > > > x <- c(1,3,1) > > y <- c(2,4,2) > > z <- c(3,4,3) > > data <- data.frame(x,y,z) > x y z >1 1 2 3 >2 3 4 4 >3 1 2 3Does this do what you want? > x <- c(1,3,1) > y <- c(2,4,2) > z <- c(3,4,3) > data <- data.frame(x,y,z) > data.u <- unique(data) > data.u x y z 1 1 2 3 2 3 4 4 > data.u <- cbind(data.u, set = 1:nrow(data.u)) > merge(data, data.u) x y z set 1 1 2 3 1 2 1 2 3 1 3 3 4 4 2 You need to do a bit more work to get them back into the original row order if that is essential.>I can't figure out how to get R to tell me that observation 1 and 3 >are the same. It seems like the "duplicated" and "unique" functions >should be able to help me out, but I am stumped. > >For instance, if I use "duplicated" ... > > > duplicated(data) >[1] FALSE FALSE TRUE > >it tells me that row 3 is a duplicate, but not which row it matches. >How do I figure out WHICH row it matches? > >And If I use "unique"... > > > unique(data) > x y z >1 1 2 3 >2 3 4 4 > >I see that rows 1 and 2 are unique, leaving me to infer that row 3 was >a duplicate, but again it doesn't tell me which row it was a duplicate >of (as far as I can tell). Am I missing something? > >How can I determine that row 3 is a duplicate OF ROW 1? > >Thanks, > >Aaron > >Michael Dewey http://www.aghmed.fsnet.co.uk