thr3ads.net - R help - [R] which element is duplicated? [Nov 2018]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2018-Nov-13 05:43 UTC

[R] which element is duplicated?

It is not clear to what you want for the general case. Perhaps:
> v <- letters[c(2,2,1,2,1,1)]
> wh <- tapply(seq_along(v),factor(v), '[',1)
> w <- wh[match(v,v[wh])]
> wb b a b a a
1 1 3 1 3 3> ## and if you want NA's for the first occurences of unique values
> ## of course:
> w[wh] <- NA
> w b  b  a  b  a  a
NA  1 NA  1  3  3

I'd like to see a cleverer solution that vectorizes and avoids the
tapply(), though.

Cheers,
Bert




On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> > match(v, unique(v))
> [1] 1 2 2 1
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch <murdoch.duncan at
gmail.com>
> wrote:
>
>> The duplicated() function gives TRUE if an item in a vector (or row in
a
>> matrix, etc.) is a duplicate of an earlier item.  But what I would like
>> to know is which item does it duplicate?
>>
>> For example,
>>
>> v <- c("a", "b", "b", "a")
>> duplicated(v)
>>
>> returns
>>
>> [1] FALSE FALSE  TRUE  TRUE
>>
>> What I want is a fast way to calculate
>>
>>   [1] NA NA 2 1
>>
>> or (equally useful to me)
>>
>>   [1] 1 2 2 1
>>
>> The result should have the property that if result[i] == j, then v[i]
=>> v[j], at least for i != j.
>>
>> Does this already exist somewhere, or is it easy to write?
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
	[[alternative HTML version deleted]]

Bert Gunter

2018-Nov-13 05:49 UTC

head link

[R] which element is duplicated?

"I'd like to see a cleverer solution that vectorizes..."

and Herve provided it.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Nov 12, 2018 at 9:43 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> It is not clear to what you want for the general case. Perhaps:
>
> > v <- letters[c(2,2,1,2,1,1)]
> > wh <- tapply(seq_along(v),factor(v), '[',1)
> > w <- wh[match(v,v[wh])]
> > w
> b b a b a a
> 1 1 3 1 3 3
> > ## and if you want NA's for the first occurences of unique values
> > ## of course:
> > w[wh] <- NA
> > w
>  b  b  a  b  a  a
> NA  1 NA  1  3  3
>
> I'd like to see a cleverer solution that vectorizes and avoids the
> tapply(), though.
>
> Cheers,
> Bert
>
>
>
>
> On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
>
>> > match(v, unique(v))
>> [1] 1 2 2 1
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming
along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
>>
>>
>> On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch <murdoch.duncan at
gmail.com>
>> wrote:
>>
>>> The duplicated() function gives TRUE if an item in a vector (or row
in a
>>> matrix, etc.) is a duplicate of an earlier item.  But what I would
like
>>> to know is which item does it duplicate?
>>>
>>> For example,
>>>
>>> v <- c("a", "b", "b",
"a")
>>> duplicated(v)
>>>
>>> returns
>>>
>>> [1] FALSE FALSE  TRUE  TRUE
>>>
>>> What I want is a fast way to calculate
>>>
>>>   [1] NA NA 2 1
>>>
>>> or (equally useful to me)
>>>
>>>   [1] 1 2 2 1
>>>
>>> The result should have the property that if result[i] == j, then
v[i] =>>> v[j], at least for i != j.
>>>
>>> Does this already exist somewhere, or is it easy to write?
>>>
>>> Duncan Murdoch
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
	[[alternative HTML version deleted]]

PIKAL Petr

2018-Nov-13 08:42 UTC

head link

[R] which element is duplicated?

Hi

similar result (with different numerical values) could be achieved by making v a
factor.
> v <- letters[c(2,2,1,2,1,1)]
> vf<-factor(v)
> as.numeric(vf)[1] 2 2 1 2 1 1

Cheers
Petr
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Bert
Gunter
> Sent: Tuesday, November 13, 2018 6:44 AM
> To: Duncan Murdoch <murdoch.duncan at gmail.com>
> Cc: R-help <R-help at r-project.org>
> Subject: Re: [R] which element is duplicated?
>
> It is not clear to what you want for the general case. Perhaps:
>
> > v <- letters[c(2,2,1,2,1,1)]
> > wh <- tapply(seq_along(v),factor(v), '[',1) w <-
wh[match(v,v[wh])] w
> b b a b a a
> 1 1 3 1 3 3
> > ## and if you want NA's for the first occurences of unique values
##
> > of course:
> > w[wh] <- NA
> > w
>  b  b  a  b  a  a
> NA  1 NA  1  3  3
>
> I'd like to see a cleverer solution that vectorizes and avoids the
tapply(),
> though.
>
> Cheers,
> Bert
>
>
>
>
> On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
>
> > > match(v, unique(v))
> > [1] 1 2 2 1
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming
along
> > and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
> >
> >
> > On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch
> > <murdoch.duncan at gmail.com>
> > wrote:
> >
> >> The duplicated() function gives TRUE if an item in a vector (or
row
> >> in a matrix, etc.) is a duplicate of an earlier item.  But what I
> >> would like to know is which item does it duplicate?
> >>
> >> For example,
> >>
> >> v <- c("a", "b", "b",
"a")
> >> duplicated(v)
> >>
> >> returns
> >>
> >> [1] FALSE FALSE  TRUE  TRUE
> >>
> >> What I want is a fast way to calculate
> >>
> >>   [1] NA NA 2 1
> >>
> >> or (equally useful to me)
> >>
> >>   [1] 1 2 2 1
> >>
> >> The result should have the property that if result[i] == j, then
v[i]
> >> == v[j], at least for i != j.
> >>
> >> Does this already exist somewhere, or is it easy to write?
> >>
> >> Duncan Murdoch
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch
partner? PRECHEZA a.s. jsou zve?ejn?ny na:
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about
processing and protection of business partner?s personal data are available on
website: https://www.precheza.cz/en/personal-data-protection-principles/
D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a
podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti:
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to
it may be confidential and are subject to the legally binding disclaimer:
https://www.precheza.cz/en/01-disclaimer/

Martin Maechler

2018-Nov-13 09:08 UTC

head link

[R] which element is duplicated?

>>>>> PIKAL Petr 
>>>>>     on Tue, 13 Nov 2018 08:42:22 +0000 writes:
    > Hi
    > similar result (with different numerical values) could
    > be achieved by making v a factor.
> > v <- letters[c(2,2,1,2,1,1)]
> > vf<-factor(v)
> > as.numeric(vf)
> [1] 2 2 1 2 1 1
> 
> Cheers
> Petr
Yes, as was already remarked by Michael Sumner.

But really the power is in  match() :  It is called at *twice* by factor().

Martin
> > -----Original Message-----
> > From: R-help <r-help-bounces at r-project.org> On Behalf Of Bert
Gunter
> > Sent: Tuesday, November 13, 2018 6:44 AM
> > To: Duncan Murdoch <murdoch.duncan at gmail.com>
> > Cc: R-help <R-help at r-project.org>
> > Subject: Re: [R] which element is duplicated?
> >
> > It is not clear to what you want for the general case. Perhaps:
> >
> > > v <- letters[c(2,2,1,2,1,1)]
> > > wh <- tapply(seq_along(v),factor(v), '[',1) w <-
wh[match(v,v[wh])] w
> > b b a b a a
> > 1 1 3 1 3 3
> > > ## and if you want NA's for the first occurences of unique
values ##
> > > of course:
> > > w[wh] <- NA
> > > w
> >  b  b  a  b  a  a
> > NA  1 NA  1  3  3
> >
> > I'd like to see a cleverer solution that vectorizes and avoids the
tapply(),
> > though.
> >
> > Cheers,
> > Bert
> >
> >
> >
> >
> > On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <bgunter.4567 at
gmail.com>
> > wrote:
> >
> > > > match(v, unique(v))
> > > [1] 1 2 2 1
> > >
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep
coming along
> > > and sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> > >
> > >
> > > On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch
> > > <murdoch.duncan at gmail.com>
> > > wrote:
> > >
> > >> The duplicated() function gives TRUE if an item in a vector
(or row
> > >> in a matrix, etc.) is a duplicate of an earlier item.  But
what I
> > >> would like to know is which item does it duplicate?
> > >>
> > >> For example,
> > >>
> > >> v <- c("a", "b", "b",
"a")
> > >> duplicated(v)
> > >>
> > >> returns
> > >>
> > >> [1] FALSE FALSE  TRUE  TRUE
> > >>
> > >> What I want is a fast way to calculate
> > >>
> > >>   [1] NA NA 2 1
> > >>
> > >> or (equally useful to me)
> > >>
> > >>   [1] 1 2 2 1
> > >>
> > >> The result should have the property that if result[i] == j,
then v[i]
> > >> == v[j], at least for i != j.
> > >>
> > >> Does this already exist somewhere, or is it easy to write?
> > >>
> > >> Duncan Murdoch
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible
code.

R help - Nov 2018 - which element is duplicated?

[R] which element is duplicated?

[R] which element is duplicated?

[R] which element is duplicated?

[R] which element is duplicated?