Hello, In a data frame I want to identify ALL duplicate IDs in the example to be able to examine "OS" and "time". (df<-data.frame(ID=c("userA", "userB", "userA", "userC"), OS=c("Win","OSX","Win", "Win64"), time=c("12:22","23:22","04:44","12:28"))) ID OS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 4 userC Win64 12:28 My desired output is that ALL records with the same IDs are found: userA Win 12:22 userA Win 04:44 preferably by returning logical values (TRUE FALSE TRUE FALSE) Is there a simple way to do that? [-- With duplicated(df$ID) the output will be [1] FALSE FALSE TRUE FALSE i.e. not all user A records are found With unique(df$ID) [1] userA userB userC Levels: userA userB userC i.e. one of each ID is found --] Erik Svensson -- View this message in context: http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3865139.html Sent from the R help mailing list archive at Nabble.com.
On 02.10.2011 16:05, Erik Svensson wrote:> Hello, > In a data frame I want to identify ALL duplicate IDs in the example to be > able to examine "OS" and "time". > > (df<-data.frame(ID=c("userA", "userB", "userA", "userC"), > OS=c("Win","OSX","Win", "Win64"), > time=c("12:22","23:22","04:44","12:28"))) > > ID OS time > 1 userA Win 12:22 > 2 userB OSX 23:22 > 3 userA Win 04:44 > 4 userC Win64 12:28 > > My desired output is that ALL records with the same IDs are found: > > userA Win 12:22 > userA Win 04:44See ?split or ?subset Uwe Ligges> > preferably by returning logical values (TRUE FALSE TRUE FALSE) > > Is there a simple way to do that? > > [-- With duplicated(df$ID) the output will be > [1] FALSE FALSE TRUE FALSE > i.e. not all user A records are found > > With unique(df$ID) > [1] userA userB userC > Levels: userA userB userC > i.e. one of each ID is found --] > > Erik Svensson > > -- > View this message in context: http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3865139.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Sun, Oct 2, 2011 at 10:05 AM, Erik Svensson <erik.b.svensson at gmail.com> wrote:> Hello, > In a data frame I want to identify ALL duplicate IDs in the example to be > able to examine "OS" and "time". > > (df<-data.frame(ID=c("userA", "userB", "userA", "userC"), > ?OS=c("Win","OSX","Win", "Win64"), > ?time=c("12:22","23:22","04:44","12:28"))) > > ? ? ID ? ?OS ?time > 1 userA ? Win 12:22 > 2 userB ? OSX 23:22 > 3 userA ? Win 04:44 > 4 userC Win64 12:28 > > My desired output is that ALL records with the same IDs are found: > > userA ? Win 12:22 > userA ? Win 04:44 > > preferably by returning logical values (TRUE FALSE TRUE FALSE) >Try this:> ave(rownames(df), df$ID, FUN = length) > 1[1] TRUE FALSE TRUE FALSE -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com