thr3ads.net - R help - [R] Find all duplicate records [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Erik Svensson

2011-Oct-02 14:05 UTC

[R] Find all duplicate records

Hello,
In a data frame I want to identify ALL duplicate IDs in the example to be
able to examine "OS" and "time".

(df<-data.frame(ID=c("userA", "userB", "userA",
"userC"),
  OS=c("Win","OSX","Win", "Win64"),
 
time=c("12:22","23:22","04:44","12:28")))

     ID    OS  time
1 userA   Win 12:22
2 userB   OSX 23:22
3 userA   Win 04:44
4 userC Win64 12:28

My desired output is that ALL records with the same IDs are found:

userA   Win 12:22
userA   Win 04:44

preferably by returning logical values (TRUE FALSE TRUE FALSE)

Is there a simple way to do that?

[-- With duplicated(df$ID) the output will be
[1] FALSE FALSE  TRUE FALSE 
i.e. not all user A records are found

With unique(df$ID)
[1] userA userB userC
Levels: userA userB userC 
i.e. one of each ID is found --]

Erik Svensson

--
View this message in context:
http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3865139.html
Sent from the R help mailing list archive at Nabble.com.

Uwe Ligges

2011-Oct-02 14:48 UTC

head link

[R] Find all duplicate records

On 02.10.2011 16:05, Erik Svensson wrote:> Hello,
> In a data frame I want to identify ALL duplicate IDs in the example to be
> able to examine "OS" and "time".
>
> (df<-data.frame(ID=c("userA", "userB",
"userA", "userC"),
>    OS=c("Win","OSX","Win",
"Win64"),
>   
time=c("12:22","23:22","04:44","12:28")))
>
>       ID    OS  time
> 1 userA   Win 12:22
> 2 userB   OSX 23:22
> 3 userA   Win 04:44
> 4 userC Win64 12:28
>
> My desired output is that ALL records with the same IDs are found:
>
> userA   Win 12:22
> userA   Win 04:44
See ?split or ?subset

Uwe Ligges

>
> preferably by returning logical values (TRUE FALSE TRUE FALSE)
>
> Is there a simple way to do that?
>
> [-- With duplicated(df$ID) the output will be
> [1] FALSE FALSE  TRUE FALSE
> i.e. not all user A records are found
>
> With unique(df$ID)
> [1] userA userB userC
> Levels: userA userB userC
> i.e. one of each ID is found --]
>
> Erik Svensson
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3865139.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Gabor Grothendieck

2011-Oct-02 22:47 UTC

head link

[R] Find all duplicate records

On Sun, Oct 2, 2011 at 10:05 AM, Erik Svensson
<erik.b.svensson at gmail.com> wrote:> Hello,
> In a data frame I want to identify ALL duplicate IDs in the example to be
> able to examine "OS" and "time".
>
> (df<-data.frame(ID=c("userA", "userB",
"userA", "userC"),
> ?OS=c("Win","OSX","Win", "Win64"),
>
?time=c("12:22","23:22","04:44","12:28")))
>
> ? ? ID ? ?OS ?time
> 1 userA ? Win 12:22
> 2 userB ? OSX 23:22
> 3 userA ? Win 04:44
> 4 userC Win64 12:28
>
> My desired output is that ALL records with the same IDs are found:
>
> userA ? Win 12:22
> userA ? Win 04:44
>
> preferably by returning logical values (TRUE FALSE TRUE FALSE)
>
Try this:
> ave(rownames(df), df$ID, FUN = length) > 1[1]  TRUE FALSE  TRUE FALSE


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Erik Svensson

2011-Oct-03 14:44 UTC

head link

[R] Find all duplicate records

It works, thanks a lot Gabor
Erik

--
View this message in context:
http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3867724.html
Sent from the R help mailing list archive at Nabble.com.

R help - Oct 2011 - Find all duplicate records

[R] Find all duplicate records

[R] Find all duplicate records

[R] Find all duplicate records

[R] Find all duplicate records