Hi everybody. I want to identify not only duplicate number but also the original number that has been duplicated. Example: x=c(1,2,3,4,4,5,6,7,8,9) y=duplicated(x) rbind(x,y) gives: [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] x 1 2 3 4 4 5 6 7 8 9 y 0 0 0 0 1 0 0 0 0 0 i.e. the second 4 [,5] is a duplicate. What I want is the first and second 4. i.e [,4] and [,5] to be TRUE [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] x 1 2 3 4 4 5 6 7 8 9 y 0 0 0 1 1 0 0 0 0 0 I assume it can be done by sorting the vector and then checking is the next or the previous entry matches using identical() . I am just unsure on how to write such a loop the logic of which (I think) is as follows: sort x for every value of x check if the next value is identical and return TRUE (or 1) if it is and FALSE (or 0) if it is not AND check is the previous value is identical and return TRUE (or 1) if it is and FALSE (or 0) if it is not Im i thinking correct and can some help to write such a function regards Christiaan [[alternative HTML version deleted]]
On Thu, May 14, 2009 at 2:16 PM, christiaan pauw <cjpauw at gmail.com> wrote:> Hi everybody. > I want to identify not only duplicate number but also the original number > that has been duplicated. > Example: > x=c(1,2,3,4,4,5,6,7,8,9) > y=duplicated(x) > rbind(x,y) > > gives: > ? ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9 > y ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0 > > i.e. the second 4 [,5] is a duplicate. > > What I want is the first and second 4. i.e [,4] and [,5] to be TRUE > > ? ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9 > y ? ?0 ? ?0 ? ?0 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0 >How about rbind(x, duplicated(x) | duplicated(x, fromLast=TRUE)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] x 1 2 3 4 4 5 6 7 8 9 0 0 0 1 1 0 0 0 0 0> I assume it can be done by sorting the vector and then checking is the next > or the previous entry matches using > identical() . I am just unsure on how to write such a loop the logic of > which (I think) is as follows: > > sort x > for every value of x check if the next value is identical and return TRUE > (or 1) if it is and FALSE (or 0) if it is not > AND > check is the previous value is identical and return TRUE (or 1) if it is and > FALSE (or 0) if it is not > > Im i thinking correct and can some help to write such a function > > regards > Christiaan > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Try this x%in%x[which(y)]>From your example> x=c(1,2,3,4,4,5,6,7,8,9) > y=duplicated(x) > rbind(x,y)[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] x 1 2 3 4 4 5 6 7 8 9 y 0 0 0 0 1 0 0 0 0 0> which(y)[1] 5> x[which(y)][1] 4> x%in%x[which(y)][1] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE Andrej -- Andrej Blejec National Institute of Biology Vecna pot 111 POB 141 SI-1000 Ljubljana SLOVENIA e-mail: andrej.blejec at nib.si URL: http://ablejec.nib.si tel: + 386 (0)59 232 789 fax: + 386 1 241 29 80 -------------------------- Organizer of Applied Statistics 2009 conference http://conferences.nib.si/AS2009> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of christiaan pauw > Sent: Thursday, May 14, 2009 8:17 AM > To: r-help at r-project.org > Subject: [R] Duplicates and duplicated > > Hi everybody. > I want to identify not only duplicate number but also the original > number > that has been duplicated. > Example: > x=c(1,2,3,4,4,5,6,7,8,9) > y=duplicated(x) > rbind(x,y) > > gives: > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > x 1 2 3 4 4 5 6 7 8 9 > y 0 0 0 0 1 0 0 0 0 0 > > i.e. the second 4 [,5] is a duplicate. > > What I want is the first and second 4. i.e [,4] and [,5] to be TRUE > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > x 1 2 3 4 4 5 6 7 8 9 > y 0 0 0 1 1 0 0 0 0 0 > > I assume it can be done by sorting the vector and then checking is the > next > or the previous entry matches using > identical() . I am just unsure on how to write such a loop the logicof> which (I think) is as follows: > > sort x > for every value of x check if the next value is identical and return > TRUE > (or 1) if it is and FALSE (or 0) if it is not > AND > check is the previous value is identical and return TRUE (or 1) if it > is and > FALSE (or 0) if it is not > > Im i thinking correct and can some help to write such a function > > regards > Christiaan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
The operator %in% is very good! And that can be simpler like this: x %in% x[duplicated(x)] [1] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE On Thu, May 14, 2009 at 4:43 PM, Andrej Blejec <Andrej.Blejec at nib.si> wrote:> Try this > > x%in%x[which(y)] > > >From your example > >> x=c(1,2,3,4,4,5,6,7,8,9) >> y=duplicated(x) >> rbind(x,y) > ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9 > y ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0 >> which(y) > [1] 5 >> x[which(y)] > [1] 4 >> x%in%x[which(y)] > ?[1] FALSE FALSE FALSE ?TRUE ?TRUE FALSE FALSE FALSE FALSE FALSE > > Andrej > > -- > Andrej Blejec > National Institute of Biology > Vecna pot 111 POB 141 > SI-1000 Ljubljana > SLOVENIA > e-mail: andrej.blejec at nib.si > URL: http://ablejec.nib.si > tel: + 386 (0)59 232 789 > fax: + 386 1 241 29 80 > -------------------------- > Organizer of > Applied Statistics 2009 conference > http://conferences.nib.si/AS2009 > > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- >> project.org] On Behalf Of christiaan pauw >> Sent: Thursday, May 14, 2009 8:17 AM >> To: r-help at r-project.org >> Subject: [R] Duplicates and duplicated >> >> Hi everybody. >> I want to identify not only duplicate number but also the original >> number >> that has been duplicated. >> Example: >> x=c(1,2,3,4,4,5,6,7,8,9) >> y=duplicated(x) >> rbind(x,y) >> >> gives: >> ? ? [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] >> x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9 >> y ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0 >> >> i.e. the second 4 [,5] is a duplicate. >> >> What I want is the first and second 4. i.e [,4] and [,5] to be TRUE >> >> ? ? [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] >> x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9 >> y ? ?0 ? ?0 ? ?0 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0 >> >> I assume it can be done by sorting the vector and then checking is the >> next >> or the previous entry matches using >> identical() . I am just unsure on how to write such a loop the logic > of >> which (I think) is as follows: >> >> sort x >> for every value of x check if the next value is identical and return >> TRUE >> (or 1) if it is and FALSE (or 0) if it is not >> AND >> check is the previous value is identical and return TRUE (or 1) if it >> is and >> FALSE (or 0) if it is not >> >> Im i thinking correct and can some help to write such a function >> >> regards >> Christiaan >> >> ? ? ? [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Noting that:> ave(x, x, FUN = length) > 1[1] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE try this:> rbind(x, dup = ave(x, x, FUN = length) > 1)[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] x 1 2 3 4 4 5 6 7 8 9 dup 0 0 0 1 1 0 0 0 0 0 On Thu, May 14, 2009 at 2:16 AM, christiaan pauw <cjpauw at gmail.com> wrote:> Hi everybody. > I want to identify not only duplicate number but also the original number > that has been duplicated. > Example: > x=c(1,2,3,4,4,5,6,7,8,9) > y=duplicated(x) > rbind(x,y) > > gives: > ? ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9 > y ? ?0 ? ?0 ? ?0 ? ?0 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0 > > i.e. the second 4 [,5] is a duplicate. > > What I want is the first and second 4. i.e [,4] and [,5] to be TRUE > > ? ?[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > x ? ?1 ? ?2 ? ?3 ? ?4 ? ?4 ? ?5 ? ?6 ? ?7 ? ?8 ? ? 9 > y ? ?0 ? ?0 ? ?0 ? ?1 ? ?1 ? ?0 ? ?0 ? ?0 ? ?0 ? ? 0 > > I assume it can be done by sorting the vector and then checking is the next > or the previous entry matches using > identical() . I am just unsure on how to write such a loop the logic of > which (I think) is as follows: > > sort x > for every value of x check if the next value is identical and return TRUE > (or 1) if it is and FALSE (or 0) if it is not > AND > check is the previous value is identical and return TRUE (or 1) if it is and > FALSE (or 0) if it is not > > Im i thinking correct and can some help to write such a function > > regards > Christiaan > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >