thr3ads.net - R help - [R] Better way of Grouping? [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Charles Determan Jr

2012-Sep-28 18:59 UTC

[R] Better way of Grouping?

Hello R users,

This is more of a convenience question that I hope others might find useful
if there is a better answer.  I work with large datasets that requires
multiple parsing stages for different analysis.  For example, compare group
3 vs. group 4.  A more complicated comparison would be time B in group 3 of
group L with B in group 4 of group L.  I normally subset each group with
the following type of code.

data=read(...)

#L v D
L=data[LvD %in% c("L"),]
D=data[LvD %in% c("D"),]

#Groups 3 and 4 within L and D
group3L=L[group %in% c("3"),]
group4L=L[group %in% c("3"),]

group3D=D[group %in% c("3"),]
group4D=D[group %in% c("3"),]

#Times B, S45, FR2, FR8
you get the idea


Is there a more efficient way to subset groups?  Thanks for any insight.

Regards,
Charles

	[[alternative HTML version deleted]]

arun

2012-Sep-28 20:31 UTC

head link

[R] Better way of Grouping?

Hi,
You can also use grep() to subset:


LD<-paste0(rep(rep(c(3,4),each=4),2),c(rep("L",8),rep("D",8)))
set.seed(1)
dat1<-data.frame(LD=LD,value=sample(1:15,16,replace=TRUE))
dat2<-within(dat1,{LD<-as.character(LD)})
dat2[grepl(".*L",dat2$LD),] # subset all L values
dat2[grepl(".*D",dat2$LD),] # subset all D values
?dat2[grepl("3D",dat2$LD),]
dat2[grepl("4D",dat2$LD),]


A.K.




----- Original Message -----
From: Charles Determan Jr <deter088 at umn.edu>
To: r-help at r-project.org
Cc: 
Sent: Friday, September 28, 2012 2:59 PM
Subject: [R] Better way of Grouping?

Hello R users,

This is more of a convenience question that I hope others might find useful
if there is a better answer.? I work with large datasets that requires
multiple parsing stages for different analysis.? For example, compare group
3 vs. group 4.? A more complicated comparison would be time B in group 3 of
group L with B in group 4 of group L.? I normally subset each group with
the following type of code.

data=read(...)

#L v D
L=data[LvD %in% c("L"),]
D=data[LvD %in% c("D"),]

#Groups 3 and 4 within L and D
group3L=L[group %in% c("3"),]
group4L=L[group %in% c("3"),]

group3D=D[group %in% c("3"),]
group4D=D[group %in% c("3"),]

#Times B, S45, FR2, FR8
you get the idea


Is there a more efficient way to subset groups?? Thanks for any insight.

Regards,
Charles

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jeff Newmiller

2012-Sep-28 21:25 UTC

head link

[R] Better way of Grouping?

You have not specified the objective function you are trying to optimize with
your term "efficient", or what you do with all of these subsets once
you have them.

For notational simplification and completeness of coverage (not necessarily
computational speedup) you might want to look at "tapply" or
ddply/dlply from the plyr package. If you build lists of subsets you can index
into them according to grouping value. You can use expand.grid to build all
permutations of grouping values to use as indexes into those lists of subsets.

To reiterate, you have not indicated what you want to do with these subsets, so
there could be special-purpose functions that do what you want.  As always,
reproducible code leads to reproducible answers. :)
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Charles Determan Jr <deter088 at umn.edu> wrote:
>Hello R users,
>
>This is more of a convenience question that I hope others might find
>useful
>if there is a better answer.  I work with large datasets that requires
>multiple parsing stages for different analysis.  For example, compare
>group
>3 vs. group 4.  A more complicated comparison would be time B in group
>3 of
>group L with B in group 4 of group L.  I normally subset each group
>with
>the following type of code.
>
>data=read(...)
>
>#L v D
>L=data[LvD %in% c("L"),]
>D=data[LvD %in% c("D"),]
>
>#Groups 3 and 4 within L and D
>group3L=L[group %in% c("3"),]
>group4L=L[group %in% c("3"),]
>
>group3D=D[group %in% c("3"),]
>group4D=D[group %in% c("3"),]
>
>#Times B, S45, FR2, FR8
>you get the idea
>
>
>Is there a more efficient way to subset groups?  Thanks for any
>insight.
>
>Regards,
>Charles
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

David Winsemius

2012-Sep-28 22:09 UTC

head link

[R] Better way of Grouping?

On Sep 28, 2012, at 11:59 AM, Charles Determan Jr wrote:
> Hello R users,
> 
> This is more of a convenience question that I hope others might find useful
> if there is a better answer.  I work with large datasets that requires
> multiple parsing stages for different analysis.  For example, compare group
> 3 vs. group 4.  A more complicated comparison would be time B in group 3 of
> group L with B in group 4 of group L.  I normally subset each group with
> the following type of code.
> 
> data=read(...)
> 
> #L v D
> L=data[LvD %in% c("L"),]
> D=data[LvD %in% c("D"),]
> 
> #Groups 3 and 4 within L and D
> group3L=L[group %in% c("3"),]
> group4L=L[group %in% c("3"),]
Assume you meant to have a "4" there> 
> group3D=D[group %in% c("3"),]
> group4D=D[group %in% c("3"),]
Ditto. Only makes sense with a "4".



The usual way is to use:

lapply( split(data, interaction(data$LvD, data$group)) ,
         fun( subdf) {<do something with subdf>} )

That way you do not end up littering you workspace with subsidiary subsets of
you main data object.

> 
> #Times B, S45, FR2, FR8
> you get the idea
> 
> 
> Is there a more efficient way to subset groups?  Thanks for any insight.
> -- 

David Winsemius, MD
Alameda, CA, USA

Seemingly Similar Threads

Search for more reasonably related threads

R help - Sep 2012 - Better way of Grouping?

[R] Better way of Grouping?

[R] Better way of Grouping?

[R] Better way of Grouping?

[R] Better way of Grouping?

Seemingly Similar Threads