Hello R users, This is more of a convenience question that I hope others might find useful if there is a better answer. I work with large datasets that requires multiple parsing stages for different analysis. For example, compare group 3 vs. group 4. A more complicated comparison would be time B in group 3 of group L with B in group 4 of group L. I normally subset each group with the following type of code. data=read(...) #L v D L=data[LvD %in% c("L"),] D=data[LvD %in% c("D"),] #Groups 3 and 4 within L and D group3L=L[group %in% c("3"),] group4L=L[group %in% c("3"),] group3D=D[group %in% c("3"),] group4D=D[group %in% c("3"),] #Times B, S45, FR2, FR8 you get the idea Is there a more efficient way to subset groups? Thanks for any insight. Regards, Charles [[alternative HTML version deleted]]
Hi, You can also use grep() to subset: LD<-paste0(rep(rep(c(3,4),each=4),2),c(rep("L",8),rep("D",8))) set.seed(1) dat1<-data.frame(LD=LD,value=sample(1:15,16,replace=TRUE)) dat2<-within(dat1,{LD<-as.character(LD)}) dat2[grepl(".*L",dat2$LD),] # subset all L values dat2[grepl(".*D",dat2$LD),] # subset all D values ?dat2[grepl("3D",dat2$LD),] dat2[grepl("4D",dat2$LD),] A.K. ----- Original Message ----- From: Charles Determan Jr <deter088 at umn.edu> To: r-help at r-project.org Cc: Sent: Friday, September 28, 2012 2:59 PM Subject: [R] Better way of Grouping? Hello R users, This is more of a convenience question that I hope others might find useful if there is a better answer.? I work with large datasets that requires multiple parsing stages for different analysis.? For example, compare group 3 vs. group 4.? A more complicated comparison would be time B in group 3 of group L with B in group 4 of group L.? I normally subset each group with the following type of code. data=read(...) #L v D L=data[LvD %in% c("L"),] D=data[LvD %in% c("D"),] #Groups 3 and 4 within L and D group3L=L[group %in% c("3"),] group4L=L[group %in% c("3"),] group3D=D[group %in% c("3"),] group4D=D[group %in% c("3"),] #Times B, S45, FR2, FR8 you get the idea Is there a more efficient way to subset groups?? Thanks for any insight. Regards, Charles ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
You have not specified the objective function you are trying to optimize with your term "efficient", or what you do with all of these subsets once you have them. For notational simplification and completeness of coverage (not necessarily computational speedup) you might want to look at "tapply" or ddply/dlply from the plyr package. If you build lists of subsets you can index into them according to grouping value. You can use expand.grid to build all permutations of grouping values to use as indexes into those lists of subsets. To reiterate, you have not indicated what you want to do with these subsets, so there could be special-purpose functions that do what you want. As always, reproducible code leads to reproducible answers. :) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Charles Determan Jr <deter088 at umn.edu> wrote:>Hello R users, > >This is more of a convenience question that I hope others might find >useful >if there is a better answer. I work with large datasets that requires >multiple parsing stages for different analysis. For example, compare >group >3 vs. group 4. A more complicated comparison would be time B in group >3 of >group L with B in group 4 of group L. I normally subset each group >with >the following type of code. > >data=read(...) > >#L v D >L=data[LvD %in% c("L"),] >D=data[LvD %in% c("D"),] > >#Groups 3 and 4 within L and D >group3L=L[group %in% c("3"),] >group4L=L[group %in% c("3"),] > >group3D=D[group %in% c("3"),] >group4D=D[group %in% c("3"),] > >#Times B, S45, FR2, FR8 >you get the idea > > >Is there a more efficient way to subset groups? Thanks for any >insight. > >Regards, >Charles > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
On Sep 28, 2012, at 11:59 AM, Charles Determan Jr wrote:> Hello R users, > > This is more of a convenience question that I hope others might find useful > if there is a better answer. I work with large datasets that requires > multiple parsing stages for different analysis. For example, compare group > 3 vs. group 4. A more complicated comparison would be time B in group 3 of > group L with B in group 4 of group L. I normally subset each group with > the following type of code. > > data=read(...) > > #L v D > L=data[LvD %in% c("L"),] > D=data[LvD %in% c("D"),] > > #Groups 3 and 4 within L and D > group3L=L[group %in% c("3"),] > group4L=L[group %in% c("3"),]Assume you meant to have a "4" there> > group3D=D[group %in% c("3"),] > group4D=D[group %in% c("3"),]Ditto. Only makes sense with a "4". The usual way is to use: lapply( split(data, interaction(data$LvD, data$group)) , fun( subdf) {<do something with subdf>} ) That way you do not end up littering you workspace with subsidiary subsets of you main data object.> > #Times B, S45, FR2, FR8 > you get the idea > > > Is there a more efficient way to subset groups? Thanks for any insight. >-- David Winsemius, MD Alameda, CA, USA