gcam
2009-Dec-17 03:55 UTC
[R] Remove duplicates from a data frame but with some special requirements
Hi all. So I have a data frame with multiple columns/variables. The first variable is a major sample name for which there are some sub-samples. Currently I have used the following command to remove the duplicates: Samps_working<-Samps[-c(which(duplicated(Samps$ESR_Ref_edit))),] This removes all of the duplicated sample rows. However, I just realised that, of course, this removes the first observation of each duplicated set. However, I wish to retain any that have the code "Y" in another variable Samps$Loaded. I'm at a bit of a loss as to how best to approach this problem. Just to reiterate. I want to remove all duplicate lines based on sample name, but, I want the lines to be removed with a preference given to those that do not include a "Y" in the Loaded variable column. -- View this message in context: http://n4.nabble.com/Remove-duplicates-from-a-data-frame-but-with-some-special-requirements-tp965745p965745.html Sent from the R help mailing list archive at Nabble.com.
Gray Calhoun
2009-Dec-17 05:08 UTC
[R] Remove duplicates from a data frame but with some special requirements
Hi, Try: subset(Samps, !duplicated(Samps$ESR_ref_edit) | Samps$Loaded == "Y") I'd need specific code to be sure that this is exactly what you want (ie you specify input and desired output), but indexing with a logical vector is probably going to be the solution. Best, Gray On Wed, Dec 16, 2009 at 7:55 PM, gcam <gcam032 at gmail.com> wrote:> > Hi all. > > So I have a data frame with multiple columns/variables. ?The first variable > is a major sample name for which there are some sub-samples. ?Currently I > have used the following command to remove the duplicates: > > Samps_working<-Samps[-c(which(duplicated(Samps$ESR_Ref_edit))),] > > This removes all of the duplicated sample rows. > > However, I just realised that, of course, this removes the first observation > of each duplicated set. ?However, I wish to retain any that have the code > "Y" in another variable Samps$Loaded. ?I'm at a bit of a loss as to how best > to approach this problem. > > Just to reiterate. ?I want to remove all duplicate lines based on sample > name, but, I want the lines to be removed with a preference given to those > that do not include a "Y" in the Loaded variable column. > -- > View this message in context: http://n4.nabble.com/Remove-duplicates-from-a-data-frame-but-with-some-special-requirements-tp965745p965745.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Gray Calhoun Assistant Professor of Economics Iowa State University