The duplicated() function gives TRUE if an item in a vector (or row in a matrix, etc.) is a duplicate of an earlier item. But what I would like to know is which item does it duplicate? For example, v <- c("a", "b", "b", "a") duplicated(v) returns [1] FALSE FALSE TRUE TRUE What I want is a fast way to calculate [1] NA NA 2 1 or (equally useful to me) [1] 1 2 2 1 The result should have the property that if result[i] == j, then v[i] == v[j], at least for i != j. Does this already exist somewhere, or is it easy to write? Duncan Murdoch
what about as.integer(factor(v, levels = unique(v))) I recall very clearly when I realized the power of this feature of factor(), but I've not seen it discussed much. Cheers, Mike. On Tue, 13 Nov 2018 at 12:08 Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> The duplicated() function gives TRUE if an item in a vector (or row in a > matrix, etc.) is a duplicate of an earlier item. But what I would like > to know is which item does it duplicate? > > For example, > > v <- c("a", "b", "b", "a") > duplicated(v) > > returns > > [1] FALSE FALSE TRUE TRUE > > What I want is a fast way to calculate > > [1] NA NA 2 1 > > or (equally useful to me) > > [1] 1 2 2 1 > > The result should have the property that if result[i] == j, then v[i] => v[j], at least for i != j. > > Does this already exist somewhere, or is it easy to write? > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dr. Michael Sumner Software and Database Engineer Australian Antarctic Division 203 Channel Highway Kingston Tasmania 7050 Australia [[alternative HTML version deleted]]
> match(v, unique(v))[1] 1 2 2 1 Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> The duplicated() function gives TRUE if an item in a vector (or row in a > matrix, etc.) is a duplicate of an earlier item. But what I would like > to know is which item does it duplicate? > > For example, > > v <- c("a", "b", "b", "a") > duplicated(v) > > returns > > [1] FALSE FALSE TRUE TRUE > > What I want is a fast way to calculate > > [1] NA NA 2 1 > > or (equally useful to me) > > [1] 1 2 2 1 > > The result should have the property that if result[i] == j, then v[i] => v[j], at least for i != j. > > Does this already exist somewhere, or is it easy to write? > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi, On 11/12/18 17:08, Duncan Murdoch wrote:> The duplicated() function gives TRUE if an item in a vector (or row in > a matrix, etc.) is a duplicate of an earlier item.? But what I would > like to know is which item does it duplicate? > > For example, > > v <- c("a", "b", "b", "a") > duplicated(v) > > returns > > [1] FALSE FALSE? TRUE? TRUE > > What I want is a fast way to calculate > > ?[1] NA NA 2 1 > > or (equally useful to me) > > ?[1] 1 2 2 1 > > The result should have the property that if result[i] == j, then v[i] > == v[j], at least for i != j. > > Does this already exist somewhere, or is it easy to write?I generally use match() for that: > v <- c("a", "b", "b", "a") > match(v, v) [1] 1 2 2 1 H.> > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=APEsp-OzJs6YdfshtiYe715BsAor8xTu26lpN4KGOrU&s=opxT_5og2YaWKdiXD-cRz0gWxGGMRG6kq20Jo8711qA&e= > > PLEASE do read the posting guide > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=APEsp-OzJs6YdfshtiYe715BsAor8xTu26lpN4KGOrU&s=ZaPnASTzuEmE8EHqFL6F5wYkPhhg_uv-CMrGjY2-_Q4&e> and provide commented, minimal, self-contained, reproducible code.-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
It is not clear to what you want for the general case. Perhaps:> v <- letters[c(2,2,1,2,1,1)] > wh <- tapply(seq_along(v),factor(v), '[',1) > w <- wh[match(v,v[wh])] > wb b a b a a 1 1 3 1 3 3> ## and if you want NA's for the first occurences of unique values > ## of course: > w[wh] <- NA > wb b a b a a NA 1 NA 1 3 3 I'd like to see a cleverer solution that vectorizes and avoids the tapply(), though. Cheers, Bert On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > match(v, unique(v)) > [1] 1 2 2 1 > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch <murdoch.duncan at gmail.com> > wrote: > >> The duplicated() function gives TRUE if an item in a vector (or row in a >> matrix, etc.) is a duplicate of an earlier item. But what I would like >> to know is which item does it duplicate? >> >> For example, >> >> v <- c("a", "b", "b", "a") >> duplicated(v) >> >> returns >> >> [1] FALSE FALSE TRUE TRUE >> >> What I want is a fast way to calculate >> >> [1] NA NA 2 1 >> >> or (equally useful to me) >> >> [1] 1 2 2 1 >> >> The result should have the property that if result[i] == j, then v[i] =>> v[j], at least for i != j. >> >> Does this already exist somewhere, or is it easy to write? >> >> Duncan Murdoch >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]
On 13/11/2018 12:35 AM, Pages, Herve wrote:> Hi, > > On 11/12/18 17:08, Duncan Murdoch wrote: >> The duplicated() function gives TRUE if an item in a vector (or row in >> a matrix, etc.) is a duplicate of an earlier item.? But what I would >> like to know is which item does it duplicate? >> >> For example, >> >> v <- c("a", "b", "b", "a") >> duplicated(v) >> >> returns >> >> [1] FALSE FALSE? TRUE? TRUE >> >> What I want is a fast way to calculate >> >> ?[1] NA NA 2 1 >> >> or (equally useful to me) >> >> ?[1] 1 2 2 1 >> >> The result should have the property that if result[i] == j, then v[i] >> == v[j], at least for i != j. >> >> Does this already exist somewhere, or is it easy to write? > > I generally use match() for that: > > > v <- c("a", "b", "b", "a") > > > match(v, v) > > [1] 1 2 2 1Yes, this is perfect. Thanks to you (and the private answer I received that suggested the same). Duncan Murdoch