On 13/11/2018 12:35 AM, Pages, Herve wrote:> Hi, > > On 11/12/18 17:08, Duncan Murdoch wrote: >> The duplicated() function gives TRUE if an item in a vector (or row in >> a matrix, etc.) is a duplicate of an earlier item.? But what I would >> like to know is which item does it duplicate? >> >> For example, >> >> v <- c("a", "b", "b", "a") >> duplicated(v) >> >> returns >> >> [1] FALSE FALSE? TRUE? TRUE >> >> What I want is a fast way to calculate >> >> ?[1] NA NA 2 1 >> >> or (equally useful to me) >> >> ?[1] 1 2 2 1 >> >> The result should have the property that if result[i] == j, then v[i] >> == v[j], at least for i != j. >> >> Does this already exist somewhere, or is it easy to write? > > I generally use match() for that: > > > v <- c("a", "b", "b", "a") > > > match(v, v) > > [1] 1 2 2 1Yes, this is perfect. Thanks to you (and the private answer I received that suggested the same). Duncan Murdoch
You also asked about doing this for the rows of a matrix. unique() give the unique rows but match operates on a per element, not per row, basis. You can use split, which operates on rows of a matrix, to help.> m <- cbind( A=c(i=5,ii=5,iii=5,iv=4,v=4,vi=4), B=c(2,3,2,2,2,2) ) > unique(m)A B i 5 2 ii 5 3 iv 4 2> match(m, unique(m)) # bad[1] 1 1 1 3 3 3 4 5 4 4 4 4> asRows <- function(x) split(x, seq_len(NROW(x))) # convert to list of rows > match(asRows(m), unique(asRows(m)))[1] 1 2 1 3 3 3 For data.frames unique works on rows but match works on columns, and converting to a list of rows does not quite work, because unique looks at the row names. A modification of asRoiws works around that:> d <- data.frame(m) > unique(d)A B i 5 2 ii 5 3 iv 4 2> match(d, unique(d))[1] NA NA> asRows <- function(x) lapply(split(x, seq_len(NROW(x))), as.list) > match(asRows(d), unique(asRows(d)))[1] 1 2 1 3 3 3 Is this the sort of issue that Hadley's vectors package is addressing? Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Nov 13, 2018 at 2:15 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 13/11/2018 12:35 AM, Pages, Herve wrote: > >> Hi, >> >> On 11/12/18 17:08, Duncan Murdoch wrote: >> >>> The duplicated() function gives TRUE if an item in a vector (or row in >>> a matrix, etc.) is a duplicate of an earlier item. But what I would >>> like to know is which item does it duplicate? >>> >>> For example, >>> >>> v <- c("a", "b", "b", "a") >>> duplicated(v) >>> >>> returns >>> >>> [1] FALSE FALSE TRUE TRUE >>> >>> What I want is a fast way to calculate >>> >>> [1] NA NA 2 1 >>> >>> or (equally useful to me) >>> >>> [1] 1 2 2 1 >>> >>> The result should have the property that if result[i] == j, then v[i] >>> == v[j], at least for i != j. >>> >>> Does this already exist somewhere, or is it easy to write? >>> >> >> I generally use match() for that: >> >> > v <- c("a", "b", "b", "a") >> >> > match(v, v) >> >> [1] 1 2 2 1 >> > > Yes, this is perfect. Thanks to you (and the private answer I received > that suggested the same). > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On 13/11/2018 12:58 PM, William Dunlap wrote:> You also asked about doing this for the rows of a matrix.? unique() give > the unique rows but match operates on a per element, not per row, > basis.? You can use split, which operates on rows of a matrix, to help. > > > m <- cbind( A=c(i=5,ii=5,iii=5,iv=4,v=4,vi=4), B=c(2,3,2,2,2,2) ) > > unique(m) > ? ?A B > i? 5 2 > ii 5 3 > iv 4 2 > > match(m, unique(m)) # bad > ?[1] 1 1 1 3 3 3 4 5 4 4 4 4 > > asRows <- function(x) split(x, seq_len(NROW(x))) # convert to > list of rows > > match(asRows(m), unique(asRows(m))) > [1] 1 2 1 3 3 3 > > > For data.frames unique works on rows but match works on columns, and > converting > to a list of rows does not quite work, because unique looks at the row > names.? A > modification of asRoiws works around that: > > > d <- data.frame(m) > > unique(d) > ? ?A B > i? 5 2 > ii 5 3 > iv 4 2 > > match(d, unique(d)) > [1] NA NA > > asRows <- function(x) lapply(split(x, seq_len(NROW(x))), as.list) > > match(asRows(d), unique(asRows(d))) > [1] 1 2 1 3 3 3 >Thanks! That's very nice.> > Is this the sort of issue that Hadley's vectors package is addressing?I don't know; hopefully someone else will respond... Duncan Murdoch> > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com> > > On Tue, Nov 13, 2018 at 2:15 AM, Duncan Murdoch > <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote: > > On 13/11/2018 12:35 AM, Pages, Herve wrote: > > Hi, > > On 11/12/18 17:08, Duncan Murdoch wrote: > > The duplicated() function gives TRUE if an item in a vector > (or row in > a matrix, etc.) is a duplicate of an earlier item.? But what > I would > like to know is which item does it duplicate? > > For example, > > v <- c("a", "b", "b", "a") > duplicated(v) > > returns > > [1] FALSE FALSE? TRUE? TRUE > > What I want is a fast way to calculate > > ??[1] NA NA 2 1 > > or (equally useful to me) > > ??[1] 1 2 2 1 > > The result should have the property that if result[i] == j, > then v[i] > == v[j], at least for i != j. > > Does this already exist somewhere, or is it easy to write? > > > I generally use match() for that: > > ? > v <- c("a", "b", "b", "a") > > ? > match(v, v) > > [1] 1 2 2 1 > > > Yes, this is perfect.? Thanks to you (and the private answer I > received that suggested the same). > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > >