Hi, I have a data.frame that has a categorical variable, for which I
would like to look at the distribution of levels of this variable,
based on a grouping of two other variables.
As an example:
x <- data.frame(obs=sample(c('low', 'high'),100,
replace=TRUE),
grp1=sample(1:10, 100, replace=TRUE),
grp2=runif(100))
cut.grp1 <- cut(x$grp1, 3)
cut.grp2 <- cut(x$grp2, 3)
Thus, for each combination of levels in cut.grp1 and cut.grp2, I'd
like to obtain the distribution of levels obs. I know I can loop over
each pair of levels in cut.grp1 and cut.grp2, but is there a more
elegant way to achieve this?
--
Rajarshi Guha
NIH Chemical Genomics Center
Rajarshi -
It's not clear to me what you mean by "the distribution of
levels obs.". Does
as.data.frame(table(x$obs,cut.grp1,cut.grp2))
give you something like what you want?
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Mon, 31 Jan 2011, Rajarshi Guha wrote:
> Hi, I have a data.frame that has a categorical variable, for which I
> would like to look at the distribution of levels of this variable,
> based on a grouping of two other variables.
>
> As an example:
>
> x <- data.frame(obs=sample(c('low', 'high'),100,
replace=TRUE),
> grp1=sample(1:10, 100, replace=TRUE),
> grp2=runif(100))
>
> cut.grp1 <- cut(x$grp1, 3)
> cut.grp2 <- cut(x$grp2, 3)
>
> Thus, for each combination of levels in cut.grp1 and cut.grp2, I'd
> like to obtain the distribution of levels obs. I know I can loop over
> each pair of levels in cut.grp1 and cut.grp2, but is there a more
> elegant way to achieve this?
>
> --
> Rajarshi Guha
> NIH Chemical Genomics Center
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Indeed, tapply is what I needed. To clarify Phils' question, what I needed was tapply(x, list(cut.grp1, cut.grp2), function(z) table(z)) On Mon, Jan 31, 2011 at 4:50 PM, Bert Gunter <gunter.berton at gene.com> wrote:> ?tapply?? is the basic R function for this. There are many other packages > (e.g. plyr) and functions (e.g. ave) that simplify and streamline this for > more complicated applications. > > -- Bert > > On Mon, Jan 31, 2011 at 1:43 PM, Rajarshi Guha <rajarshi.guha at gmail.com> > wrote: >> >> Hi, I have a data.frame that has a categorical variable, for which I >> would like to look at the distribution of levels of this variable, >> based on a grouping of two other variables. >> >> As an example: >> >> x <- data.frame(obs=sample(c('low', 'high'),100, replace=TRUE), >> grp1=sample(1:10, 100, replace=TRUE), >> grp2=runif(100)) >> >> cut.grp1 <- cut(x$grp1, 3) >> cut.grp2 <- cut(x$grp2, 3) >> >> Thus, for each combination of levels in cut.grp1 and cut.grp2, I'd >> like to obtain the distribution of levels obs. I know I can loop over >> each pair of levels in cut.grp1 and cut.grp2, but is there a more >> elegant way to achieve this? >> >> -- >> Rajarshi Guha >> NIH Chemical Genomics Center >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Bert Gunter > Genentech Nonclinical Biostatistics > 467-7374 > http://devo.gene.com/groups/devo/depts/ncb/home.shtml >-- Rajarshi Guha NIH Chemical Genomics Center