I'm trying to identify and remove rows in a data frame that are duplicated only on particular columns within it (i.e. not on all columns). The "unique" function looks for uniqueness across all columns of a data frame. Identifying unique rows based only on specific columns of interest returns only those columns, not all of the columns in the original frame. I tried this, and then added an identifier column to this truncated data frame, and then tried merging this with the original data frame and selecting only those rows container the identifier. But this did not work no matter how the arguments were altered: all records were returned instead of the uniques. Completely stumped--any help appreciated. Thanks. Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740
On Tue, May 11, 2010 at 9:07 PM, Jim Bouldin <jrbouldin at ucdavis.edu> wrote:> > I'm trying to identify and remove rows in a data frame that are duplicated > only on particular columns within it (i.e. not on all columns).This is probably the cleanest way: dat <- data.frame(x = c(1, 2, 3), y = c(1, 1, 3)) subset(dat, !duplicated(y)) See this thread (among others) for some other options: http://finzi.psych.upenn.edu/Rhelp10/2010-January/224658.html