I have a data set with 3 variables V1, V2, V3. If there are 2 data points have the same values on both V1 and V2, I want to delete one of them which has smaller V3 value. i.e., in the data below, I want to delete the first observation. How can I do that ? Thanks in advance! V1 V2 V3 3 3 1 3 3 4 -- View this message in context: http://www.nabble.com/How-to-delete-a-duplicate-observation-tf4437033.html#a12659033 Sent from the R help mailing list archive at Nabble.com.
nuyaying wrote:> I have a data set with 3 variables V1, V2, V3. If there are 2 data points > have the same values on both V1 and V2, I want to delete one of them which > has smaller V3 value. i.e., in the data below, I want to delete > the first observation. How can I do that ? Thanks in advance! > > V1 V2 V3 > 3 3 1 > 3 3 4 > >Tricky one... I think something like this should work: l <- split(d$V3, list(d$V1,d$V2)) ixl <- lapply(l, function(x) { if ((n <- nrow(x)) == 2) seq_len(n) != which.min(x) else rep(TRUE, n) }) ix <- unsplit(ixl, list(d$V1,d$V2)) d[ix,] -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
nuyaying said the following on 9/13/2007 9:50 AM:> > I have a data set with 3 variables V1, V2, V3. If there are 2 data points > have the same values on both V1 and V2, I want to delete one of them which > has smaller V3 value. i.e., in the data below, I want to delete > the first observation. How can I do that ? Thanks in advance! > > V1 V2 V3 > 3 3 1 > 3 3 4 >How about: ## some sample data d <- read.table(textConnection("V1 V2 V3 3 3 2 3 3 4 3 3 1 3 2 1 3 2 5"), header = TRUE) ## the code d <- d[rev(do.call("order", d)), ] d <- d[!duplicated(d[1:2]), ] d HTH, --sundar
How about (assuming the data is in the data frame my.df):> my.df2 <- my.df[order(my.df$V3, decreasing=TRUE),] > my.df3 <- my.df2[ !duplicated( my.df2[,c('V1','V2')] ), ]If order of the rows matters then we will need to add a couple of steps to reorder. You did not say what to do if 3 or more points matched, this approach takes the largest single V3 value from all matching on V1 and V2. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of nuyaying > Sent: Thursday, September 13, 2007 10:51 AM > To: r-help at stat.math.ethz.ch > Subject: [R] How to delete a duplicate observation > > > > I have a data set with 3 variables V1, V2, V3. If there are > 2 data points have the same values on both V1 and V2, I want > to delete one of them which > has smaller V3 value. i.e., in the data below, I want to delete > the first observation. How can I do that ? Thanks in > advance! > > V1 V2 V3 > 3 3 1 > 3 3 4 > > -- > View this message in context: > http://www.nabble.com/How-to-delete-a-duplicate-observation-tf > 4437033.html#a12659033 > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >