Hi all, I have a dataset similar to the following Name Date Value A 1/01/2000 4 A 2/01/2000 4 A 3/01/2000 5 A 4/01/2000 4 A 5/01/2000 1 B 6/01/2000 2 B 7/01/2000 1 B 8/01/2000 1 I would like R to remove duplicates based on column 1 and 3 only. In addition, I would like R to remove duplicates based on the underlying and overlying row only. For example, for A, I would like to remove row 2 only and keep row 1, 3 and 4. I have tried: unique() and replicated(), but I do not have much success. I have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to apply it to this multi-column situation. Any help would be greatly appreciated. Thanks in advance, Chris -- View this message in context: http://r.789695.n4.nabble.com/Remove-duplicated-rows-tp2023065p2023065.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]]
Hi r-help-bounces at r-project.org napsal dne 23.04.2010 04:05:00:> > Hi all, > > I have a dataset similar to the following > > Name Date Value > A 1/01/2000 4 > A 2/01/2000 4 > A 3/01/2000 5 > A 4/01/2000 4 > A 5/01/2000 1 > B 6/01/2000 2 > B 7/01/2000 1 > B 8/01/2000 1 > > I would like R to remove duplicates based on column 1 and 3 only. In > addition, I would like R to remove duplicates based on the underlyingand> overlying row only. For example, for A, I would like to remove row 2only> and keep row 1, 3 and 4.Hm. Strange. You want to keep lines 1,3 an 4. for A. What about line 5? Why do you want to keep line 1 and 4 which have A an 4 in both columns? test=read.table("clipboard", header=T) test[!duplicated(paste(test[,1], test[,3])),] Name Date Value 1 A 1/01/2000 4 3 A 3/01/2000 5 5 A 5/01/2000 1 6 B 6/01/2000 2 7 B 7/01/2000 1 Gives you unique values, however I am not sure if it is what you want. Regards Petr> > I have tried: unique() and replicated(), but I do not have much success.I> have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to > apply it to this multi-column situation. > > Any help would be greatly appreciated. > > Thanks in advance, > Chris > -- > View this message in context:http://r.789695.n4.nabble.com/Remove-duplicated-> rows-tp2023065p2023065.html > Sent from the R help mailing list archive at Nabble.com. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
On Fri, Apr 23, 2010 at 4:05 AM, chrisli1223 <chrisli at austwaterenv.com.au> wrote:> > Hi all, > > I have a dataset similar to the following > > Name ? ?Date ? ?Value > A ? ? ? 1/01/2000 ? ? ? 4 > A ? ? ? 2/01/2000 ? ? ? 4 > A ? ? ? 3/01/2000 ? ? ? 5 > A ? ? ? 4/01/2000 ? ? ? 4 > A ? ? ? 5/01/2000 ? ? ? 1 > B ? ? ? 6/01/2000 ? ? ? 2 > B ? ? ? 7/01/2000 ? ? ? 1 > B ? ? ? 8/01/2000 ? ? ? 1 > > I would like R to remove duplicates based on column 1 and 3 only. In > addition, I would like R to remove duplicates based on the underlying and > overlying row only. For example, for A, I would like to remove row 2 only > and keep row 1, 3 and 4. > > I have tried: unique() and replicated(), but I do not have much success. I > have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to > apply it to this multi-column situation. > > Any help would be greatly appreciated. > > Thanks in advance, > Chris > --Hi, This code is a bit ugly, but it works. Hope it helps. /Gustaf library(zoo) test<-read.table("clipboard",header=T) test$code<-paste(test$Name,test$Value,sep="") drop.ndx<-rollapply(zoo(test$code),3,function(x)(x[2]%in%c(x[1],x[3]))) drop.ndx<-c(FALSE,drop.ndx,FALSE) test[!drop.ndx,] -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik
Try this: DF[!duplicated(DF[-2]),] On Thu, Apr 22, 2010 at 10:05 PM, chrisli1223 <chrisli at austwaterenv.com.au> wrote:> > Hi all, > > I have a dataset similar to the following > > Name ? ?Date ? ?Value > A ? ? ? 1/01/2000 ? ? ? 4 > A ? ? ? 2/01/2000 ? ? ? 4 > A ? ? ? 3/01/2000 ? ? ? 5 > A ? ? ? 4/01/2000 ? ? ? 4 > A ? ? ? 5/01/2000 ? ? ? 1 > B ? ? ? 6/01/2000 ? ? ? 2 > B ? ? ? 7/01/2000 ? ? ? 1 > B ? ? ? 8/01/2000 ? ? ? 1 > > I would like R to remove duplicates based on column 1 and 3 only. In > addition, I would like R to remove duplicates based on the underlying and > overlying row only. For example, for A, I would like to remove row 2 only > and keep row 1, 3 and 4. > > I have tried: unique() and replicated(), but I do not have much success. I > have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to > apply it to this multi-column situation. > > Any help would be greatly appreciated. > > Thanks in advance, > Chris > -- > View this message in context: http://r.789695.n4.nabble.com/Remove-duplicated-rows-tp2023065p2023065.html > Sent from the R help mailing list archive at Nabble.com. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thank you Petr, Gustaf and Gabor. Your help is much appreciated. I have tried: dataset[!duplicated(dataset[,-2]),] and it solves my problem. Thanks, Chris -- View this message in context: http://r.789695.n4.nabble.com/Remove-duplicated-rows-tp2023065p2065997.html Sent from the R help mailing list archive at Nabble.com.