thr3ads.net - R help - [R] Duplicates and duplicated [May 2009]

If this information is useful, please help other people find it:
Share via:

christiaan pauw

2009-May-14 06:16 UTC

[R] Duplicates and duplicated

Hi everybody.
I want to identify not only duplicate number but also the original number
that has been duplicated.
Example:
x=c(1,2,3,4,4,5,6,7,8,9)
y=duplicated(x)
rbind(x,y)

gives:
    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x    1    2    3    4    4    5    6    7    8     9
y    0    0    0    0    1    0    0    0    0     0

i.e. the second 4 [,5] is a duplicate.

What I want is the first and second 4. i.e [,4] and [,5] to be TRUE

    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x    1    2    3    4    4    5    6    7    8     9
y    0    0    0    1    1    0    0    0    0     0

I assume it can be done by sorting the vector and then checking is the next
or the previous entry matches using
identical() . I am just unsure on how to write such a loop the logic of
which (I think) is as follows:

sort x
for every value of x check if the next value is identical and return TRUE
(or 1) if it is and FALSE (or 0) if it is not
AND
check is the previous value is identical and return TRUE (or 1) if it is and
FALSE (or 0) if it is not

Im i thinking correct and can some help to write such a function

regards
Christiaan

	[[alternative HTML version deleted]]

Linlin Yan

2009-May-14 06:23 UTC

head link

[R] Duplicates and duplicated

On Thu, May 14, 2009 at 2:16 PM, christiaan pauw <cjpauw at gmail.com>
wrote:> Hi everybody.
> I want to identify not only duplicate number but also the original number
> that has been duplicated.
> Example:
> x=c(1,2,3,4,4,5,6,7,8,9)
> y=duplicated(x)
> rbind(x,y)
>
> gives:
> ? ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9
> y ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
>
> i.e. the second 4 [,5] is a duplicate.
>
> What I want is the first and second 4. i.e [,4] and [,5] to be TRUE
>
> ? ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9
> y ? ?0 ? ?0 ? ?0 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
>
How about

rbind(x, duplicated(x) | duplicated(x, fromLast=TRUE))
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x    1    2    3    4    4    5    6    7    8     9
     0    0    0    1    1    0    0    0    0     0
> I assume it can be done by sorting the vector and then checking is the next
> or the previous entry matches using
> identical() . I am just unsure on how to write such a loop the logic of
> which (I think) is as follows:
>
> sort x
> for every value of x check if the next value is identical and return TRUE
> (or 1) if it is and FALSE (or 0) if it is not
> AND
> check is the previous value is identical and return TRUE (or 1) if it is
and
> FALSE (or 0) if it is not
>
> Im i thinking correct and can some help to write such a function
>
> regards
> Christiaan
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Andrej Blejec

2009-May-14 08:43 UTC

head link

[R] Duplicates and duplicated

Try this

x%in%x[which(y)]
>From your example
> x=c(1,2,3,4,4,5,6,7,8,9)
> y=duplicated(x)
> rbind(x,y)  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x    1    2    3    4    4    5    6    7    8     9
y    0    0    0    0    1    0    0    0    0     0> which(y)
[1] 5> x[which(y)]
[1] 4> x%in%x[which(y)] [1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

Andrej

--
Andrej Blejec
National Institute of Biology
Vecna pot 111 POB 141
SI-1000 Ljubljana
SLOVENIA
e-mail: andrej.blejec at nib.si
URL: http://ablejec.nib.si 
tel: + 386 (0)59 232 789
fax: + 386 1 241 29 80
--------------------------
Organizer of
Applied Statistics 2009 conference
http://conferences.nib.si/AS2009

 > -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of christiaan pauw
> Sent: Thursday, May 14, 2009 8:17 AM
> To: r-help at r-project.org
> Subject: [R] Duplicates and duplicated
> 
> Hi everybody.
> I want to identify not only duplicate number but also the original
> number
> that has been duplicated.
> Example:
> x=c(1,2,3,4,4,5,6,7,8,9)
> y=duplicated(x)
> rbind(x,y)
> 
> gives:
>     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> x    1    2    3    4    4    5    6    7    8     9
> y    0    0    0    0    1    0    0    0    0     0
> 
> i.e. the second 4 [,5] is a duplicate.
> 
> What I want is the first and second 4. i.e [,4] and [,5] to be TRUE
> 
>     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> x    1    2    3    4    4    5    6    7    8     9
> y    0    0    0    1    1    0    0    0    0     0
> 
> I assume it can be done by sorting the vector and then checking is the
> next
> or the previous entry matches using
> identical() . I am just unsure on how to write such a loop the logic
of> which (I think) is as follows:
> 
> sort x
> for every value of x check if the next value is identical and return
> TRUE
> (or 1) if it is and FALSE (or 0) if it is not
> AND
> check is the previous value is identical and return TRUE (or 1) if it
> is and
> FALSE (or 0) if it is not
> 
> Im i thinking correct and can some help to write such a function
> 
> regards
> Christiaan
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Linlin Yan

2009-May-14 10:44 UTC

head link

[R] Duplicates and duplicated

The operator %in% is very good! And that can be simpler like this:
x %in% x[duplicated(x)]
 [1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

On Thu, May 14, 2009 at 4:43 PM, Andrej Blejec <Andrej.Blejec at nib.si>
wrote:> Try this
>
> x%in%x[which(y)]
>
> >From your example
>
>> x=c(1,2,3,4,4,5,6,7,8,9)
>> y=duplicated(x)
>> rbind(x,y)
> ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9
> y ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
>> which(y)
> [1] 5
>> x[which(y)]
> [1] 4
>> x%in%x[which(y)]
> ?[1] FALSE FALSE FALSE ?TRUE ?TRUE FALSE FALSE FALSE FALSE FALSE
>
> Andrej
>
> --
> Andrej Blejec
> National Institute of Biology
> Vecna pot 111 POB 141
> SI-1000 Ljubljana
> SLOVENIA
> e-mail: andrej.blejec at nib.si
> URL: http://ablejec.nib.si
> tel: + 386 (0)59 232 789
> fax: + 386 1 241 29 80
> --------------------------
> Organizer of
> Applied Statistics 2009 conference
> http://conferences.nib.si/AS2009
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of christiaan pauw
>> Sent: Thursday, May 14, 2009 8:17 AM
>> To: r-help at r-project.org
>> Subject: [R] Duplicates and duplicated
>>
>> Hi everybody.
>> I want to identify not only duplicate number but also the original
>> number
>> that has been duplicated.
>> Example:
>> x=c(1,2,3,4,4,5,6,7,8,9)
>> y=duplicated(x)
>> rbind(x,y)
>>
>> gives:
>> ? ? [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>> x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9
>> y ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
>>
>> i.e. the second 4 [,5] is a duplicate.
>>
>> What I want is the first and second 4. i.e [,4] and [,5] to be TRUE
>>
>> ? ? [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>> x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9
>> y ? ?0 ? ?0 ? ?0 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
>>
>> I assume it can be done by sorting the vector and then checking is the
>> next
>> or the previous entry matches using
>> identical() . I am just unsure on how to write such a loop the logic
> of
>> which (I think) is as follows:
>>
>> sort x
>> for every value of x check if the next value is identical and return
>> TRUE
>> (or 1) if it is and FALSE (or 0) if it is not
>> AND
>> check is the previous value is identical and return TRUE (or 1) if it
>> is and
>> FALSE (or 0) if it is not
>>
>> Im i thinking correct and can some help to write such a function
>>
>> regards
>> Christiaan
>>
>> ? ? ? [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Gabor Grothendieck

2009-May-14 14:33 UTC

head link

[R] Duplicates and duplicated

Noting that:
> ave(x, x, FUN = length) > 1 [1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

try this:
> rbind(x, dup = ave(x, x, FUN = length) > 1)    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x      1    2    3    4    4    5    6    7    8     9
dup    0    0    0    1    1    0    0    0    0     0


On Thu, May 14, 2009 at 2:16 AM, christiaan pauw <cjpauw at gmail.com>
wrote:> Hi everybody.
> I want to identify not only duplicate number but also the original number
> that has been duplicated.
> Example:
> x=c(1,2,3,4,4,5,6,7,8,9)
> y=duplicated(x)
> rbind(x,y)
>
> gives:
> ? ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9
> y ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
>
> i.e. the second 4 [,5] is a duplicate.
>
> What I want is the first and second 4. i.e [,4] and [,5] to be TRUE
>
> ? ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9
> y ? ?0 ? ?0 ? ?0 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0
>
> I assume it can be done by sorting the vector and then checking is the next
> or the previous entry matches using
> identical() . I am just unsure on how to write such a loop the logic of
> which (I think) is as follows:
>
> sort x
> for every value of x check if the next value is identical and return TRUE
> (or 1) if it is and FALSE (or 0) if it is not
> AND
> check is the previous value is identical and return TRUE (or 1) if it is
and
> FALSE (or 0) if it is not
>
> Im i thinking correct and can some help to write such a function
>
> regards
> Christiaan
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more apparently analagous threads

R help - May 2009 - Duplicates and duplicated

[R] Duplicates and duplicated

[R] Duplicates and duplicated

[R] Duplicates and duplicated

[R] Duplicates and duplicated

[R] Duplicates and duplicated

Possibly Parallel Threads