Sam Albers
2011-May-17 00:14 UTC
[R] Subsetting depth profiles based on maximum depth by group with plyr
Hello, Apologies for a similar earlier post. I didn't include enough details in that one. I am having a little trouble subsetting some data based on a grouping variable. I am using an instrument that does depth profiles of a water column. The instrument records on the way down as well as the way up. So thanks to an off-list reply I can subset the data so that all data collected at the maximum depth and those collected on the way UP the water column are used and the data collected on the way DOWN through the water column are discarded. This is illustrated by the following: dat1 <- data.frame(var=100*(0:10), depth=c(1:5,5,5:1)) dat1[ seq_len(nrow(dat1)) >= which.max(dat1$depth), ] However, I have data frame where I would like to perform this subset for several groups. My data.frame looks like the following: dat1 <- data.frame(var=100*(0:10), depth=c(1:5,5,5:1)) dat1$group <- "A" dat2 <- data.frame(var=100*(0:10), depth=c(1:5,7,5:1)) dat2$group <- "B" dat <- rbind(dat1,dat2) I thought I might be able to use the plyr package to do this but for some reason the following gives me almost the opposite of what I was hoping for: library(plyr) ddply(dat, .(group), function(.df) { .df[seq_len(nrow(.df) >= which.max(.df$depth)),] }) Can anyone recommend a way to subset based on a grouping variable preferably? Thanks in advance. Sam [[alternative HTML version deleted]]
Peter Ehlers
2011-May-17 12:57 UTC
[R] Subsetting depth profiles based on maximum depth by group with plyr
On 2011-05-16 17:14, Sam Albers wrote:> Hello, > > Apologies for a similar earlier post. I didn't include enough details in > that one. > > I am having a little trouble subsetting some data based on a grouping > variable. I am using an instrument that does depth profiles of a water > column. The instrument records on the way down as well as the way up. So > thanks to an off-list reply I can subset the data so that all data collected > at the maximum depth and those collected on the way UP the water column are > used and the data collected on the way DOWN through the water column are > discarded. This is illustrated by the following: > > dat1<- data.frame(var=100*(0:10), depth=c(1:5,5,5:1)) > dat1[ seq_len(nrow(dat1))>= which.max(dat1$depth), ] > > However, I have data frame where I would like to perform this subset for > several groups. My data.frame looks like the following: > > dat1<- data.frame(var=100*(0:10), depth=c(1:5,5,5:1)) > dat1$group<- "A" > dat2<- data.frame(var=100*(0:10), depth=c(1:5,7,5:1)) > dat2$group<- "B" > dat<- rbind(dat1,dat2) > > I thought I might be able to use the plyr package to do this but for some > reason the following gives me almost the opposite of what I was hoping for: > > > library(plyr) > ddply(dat, .(group), function(.df) { > .df[seq_len(nrow(.df)>= which.max(.df$depth)),] > }) > > Can anyone recommend a way to subset based on a grouping variable > preferably?I think that you just have a misplaced parenthesis: .df[seq_len(nrow(.df)>= which.max(.df$depth)),] --> .df[seq_len(nrow(.df))>= which.max(.df$depth),] (Thanks for providing a simple reproducible example.) Peter Ehlers> > Thanks in advance. > > Sam > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.