I'm trying to find duplicate values in a column of a data frame. For example, dataframe (a) below has two 3's. I would like to mark each value of each row as either not being a duplicate of the one before (0), or as a duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply compare each value to it's "lagged" value, but I can't figure out how to do this with R. Can someone point me in the right direction? Thanks a <- data.frame( col1 = c(1,2,3,3,4)) b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0) ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Jeff > Sent: Wednesday, July 25, 2012 3:06 PM > To: r-help at r-project.org > Subject: [R] Simple question on finding duplicates > > > I'm trying to find duplicate values in a column of a data frame. > For > example, dataframe (a) below has two 3's. I would like to mark each > value of > each row as either not being a duplicate of the one before (0), or > as a > duplicate (1) - for example, as in dataframe (b). In SPSS, I would > simply > compare each value to it's "lagged" value, but I can't figure out > how to do > this with R. > Can someone point me in the right direction? > Thanks > a <- data.frame( col1 = c(1,2,3,3,4)) > b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Minor correction: duplicate <- ifelse(c(0, a$col[-length(a$col)])==a$col, 1, 0) ------- David> -----Original Message----- > From: David L Carlson [mailto:dcarlson at tamu.edu] > Sent: Wednesday, July 25, 2012 3:23 PM > To: 'Jeff'; 'r-help at r-project.org' > Subject: RE: [R] Simple question on finding duplicates > > duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0) > > ---------------------------------------------- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77843-4352 > > > > -----Original Message----- > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > > project.org] On Behalf Of Jeff > > Sent: Wednesday, July 25, 2012 3:06 PM > > To: r-help at r-project.org > > Subject: [R] Simple question on finding duplicates > > > > > > I'm trying to find duplicate values in a column of a data frame. > > For > > example, dataframe (a) below has two 3's. I would like to mark > each > > value of > > each row as either not being a duplicate of the one before (0), or > > as a > > duplicate (1) - for example, as in dataframe (b). In SPSS, I would > > simply > > compare each value to it's "lagged" value, but I can't figure out > > how to do > > this with R. > > Can someone point me in the right direction? > > Thanks > > a <- data.frame( col1 = c(1,2,3,3,4)) > > b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code.
HI, Try this: ? a <- data.frame( col1 = c(1,2,3,3,4)) a<-within(a, duplicate<-c(0,ifelse(diff(a$col1)==0,1,0))) ?a ? col1 duplicate 1??? 1???????? 0 2??? 2???????? 0 3??? 3???????? 0 4??? 3???????? 1 5??? 4???????? 0 A.K. ----- Original Message ----- From: Jeff <r at jp.pair.com> To: r-help at r-project.org Cc: Sent: Wednesday, July 25, 2012 4:05 PM Subject: [R] Simple question on finding duplicates ? I'm? trying? to find duplicate values in a column of a data frame. For ? example, dataframe (a) below has two 3's. I would like to mark each value of ? each row as either not being a duplicate of the one before (0), or as a ? duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply ? compare each value to it's "lagged" value, but I can't figure out how to do ? this with R. ? Can someone point me in the right direction? ? Thanks ? a <- data.frame( col1 = c(1,2,3,3,4)) ? b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
duplicate <- c(0, diff(a[,"col1"]) == 0) Peter Ehlers On 2012-07-25 13:05, Jeff wrote:> > I'm trying to find duplicate values in a column of a data frame. For > example, dataframe (a) below has two 3's. I would like to mark each value of > each row as either not being a duplicate of the one before (0), or as a > duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply > compare each value to it's "lagged" value, but I can't figure out how to do > this with R. > Can someone point me in the right direction? > Thanks > a <- data.frame( col1 = c(1,2,3,3,4)) > b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >