Hi, I am trying to identify duplicate values in a column in a date frame. The duplicated function identifies the duplicate rows in the data frame but it only does this for the second record, not both records. Is there a way to mark both rows in the data frame as TRUE? dfA$dups<-duplicated(dfA$Value) dfA Site State Value dups 929 VA 73 FALSE 929 VA 73 TRUE 930 VA 76 FALSE 930 VA 76 TRUE 931 VA 74 FALSE 932 VA 75 FALSE But I would like this Site State Value dups 929 VA 73 TRUE 929 VA 73 TRUE 930 VA 76 TRUE 930 VA 76 TRUE 931 VA 74 FALSE 932 VA 75 FALSE Thank you for your replies! Still on the learning curve, Kathi -- View this message in context: http://r.789695.n4.nabble.com/Identifying-duplicate-rows-tp4642679.html Sent from the R help mailing list archive at Nabble.com.
Thanks!! That did the trick!! -- View this message in context: http://r.789695.n4.nabble.com/Identifying-duplicate-rows-tp4642679p4642683.html Sent from the R help mailing list archive at Nabble.com.
On Mon, Sep 10, 2012 at 11:23:42AM -0700, kborgmann wrote:> Hi, > I am trying to identify duplicate values in a column in a date frame. The > duplicated function identifies the duplicate rows in the data frame but it > only does this for the second record, not both records. Is there a way to > mark both rows in the data frame as TRUE? > dfA$dups<-duplicated(dfA$Value) > dfA > Site State Value dups > 929 VA 73 FALSE > 929 VA 73 TRUE > 930 VA 76 FALSE > 930 VA 76 TRUE > 931 VA 74 FALSE > 932 VA 75 FALSE > > But I would like this > Site State Value dups > 929 VA 73 TRUE > 929 VA 73 TRUE > 930 VA 76 TRUE > 930 VA 76 TRUE > 931 VA 74 FALSE > 932 VA 75 FALSEHi. Try the following. dfA <- cbind(State="VA", data.frame(Value=c(73, 73, 76, 76, 74, 75))) dfA$dups <- duplicated(dfA$Value) | duplicated(dfA$Value, fromLast=TRUE) dfA State Value dups 1 VA 73 TRUE 2 VA 73 TRUE 3 VA 76 TRUE 4 VA 76 TRUE 5 VA 74 FALSE 6 VA 75 FALSE Hope this helps. Petr Savicky.
try this: dfA$dups<-duplicated(dfA$Value) | duplicated(dfA$Value, fromLast = TRUE) On Mon, Sep 10, 2012 at 2:23 PM, kborgmann <borgmann at email.arizona.edu> wrote:> Hi, > I am trying to identify duplicate values in a column in a date frame. The > duplicated function identifies the duplicate rows in the data frame but it > only does this for the second record, not both records. Is there a way to > mark both rows in the data frame as TRUE? > dfA$dups<-duplicated(dfA$Value) > dfA > Site State Value dups > 929 VA 73 FALSE > 929 VA 73 TRUE > 930 VA 76 FALSE > 930 VA 76 TRUE > 931 VA 74 FALSE > 932 VA 75 FALSE > > But I would like this > Site State Value dups > 929 VA 73 TRUE > 929 VA 73 TRUE > 930 VA 76 TRUE > 930 VA 76 TRUE > 931 VA 74 FALSE > 932 VA 75 FALSE > > Thank you for your replies! > Still on the learning curve, > Kathi > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Identifying-duplicate-rows-tp4642679.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
Hello, Please use dput(dfA) to post your data examples. The following is it's output. All one has to do is to copy and paste to an R session to get the data example. dfA <- structure(list(Site = c(929L, 929L, 930L, 930L, 931L, 932L), State = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "VA", class = "factor"), Value = c(73L, 73L, 76L, 76L, 74L, 75L)), .Names = c("Site", "State", "Value"), class = "data.frame", row.names = c(NA, -6L)) # Now use the argument 'fromLast' dfA$dups <- duplicated(dfA) | duplicated(dfA, fromLast = TRUE) Hope this helps, Rui Barradas Em 10-09-2012 19:23, kborgmann escreveu:> Hi, > I am trying to identify duplicate values in a column in a date frame. The > duplicated function identifies the duplicate rows in the data frame but it > only does this for the second record, not both records. Is there a way to > mark both rows in the data frame as TRUE? > dfA$dups<-duplicated(dfA$Value) > dfA > Site State Value dups > 929 VA 73 FALSE > 929 VA 73 TRUE > 930 VA 76 FALSE > 930 VA 76 TRUE > 931 VA 74 FALSE > 932 VA 75 FALSE > > But I would like this > Site State Value dups > 929 VA 73 TRUE > 929 VA 73 TRUE > 930 VA 76 TRUE > 930 VA 76 TRUE > 931 VA 74 FALSE > 932 VA 75 FALSE > > Thank you for your replies! > Still on the learning curve, > Kathi > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Identifying-duplicate-rows-tp4642679.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.