Hi, I have a data.frame that has a categorical variable, for which I would like to look at the distribution of levels of this variable, based on a grouping of two other variables. As an example: x <- data.frame(obs=sample(c('low', 'high'),100, replace=TRUE), grp1=sample(1:10, 100, replace=TRUE), grp2=runif(100)) cut.grp1 <- cut(x$grp1, 3) cut.grp2 <- cut(x$grp2, 3) Thus, for each combination of levels in cut.grp1 and cut.grp2, I'd like to obtain the distribution of levels obs. I know I can loop over each pair of levels in cut.grp1 and cut.grp2, but is there a more elegant way to achieve this? -- Rajarshi Guha NIH Chemical Genomics Center
Rajarshi - It's not clear to me what you mean by "the distribution of levels obs.". Does as.data.frame(table(x$obs,cut.grp1,cut.grp2)) give you something like what you want? - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Mon, 31 Jan 2011, Rajarshi Guha wrote:> Hi, I have a data.frame that has a categorical variable, for which I > would like to look at the distribution of levels of this variable, > based on a grouping of two other variables. > > As an example: > > x <- data.frame(obs=sample(c('low', 'high'),100, replace=TRUE), > grp1=sample(1:10, 100, replace=TRUE), > grp2=runif(100)) > > cut.grp1 <- cut(x$grp1, 3) > cut.grp2 <- cut(x$grp2, 3) > > Thus, for each combination of levels in cut.grp1 and cut.grp2, I'd > like to obtain the distribution of levels obs. I know I can loop over > each pair of levels in cut.grp1 and cut.grp2, but is there a more > elegant way to achieve this? > > -- > Rajarshi Guha > NIH Chemical Genomics Center > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Indeed, tapply is what I needed. To clarify Phils' question, what I needed was tapply(x, list(cut.grp1, cut.grp2), function(z) table(z)) On Mon, Jan 31, 2011 at 4:50 PM, Bert Gunter <gunter.berton at gene.com> wrote:> ?tapply?? is the basic R function for this. There are many other packages > (e.g. plyr) and functions (e.g. ave) that simplify and streamline this for > more complicated applications. > > -- Bert > > On Mon, Jan 31, 2011 at 1:43 PM, Rajarshi Guha <rajarshi.guha at gmail.com> > wrote: >> >> Hi, I have a data.frame that has a categorical variable, for which I >> would like to look at the distribution of levels of this variable, >> based on a grouping of two other variables. >> >> As an example: >> >> x <- data.frame(obs=sample(c('low', 'high'),100, replace=TRUE), >> grp1=sample(1:10, 100, replace=TRUE), >> grp2=runif(100)) >> >> cut.grp1 <- cut(x$grp1, 3) >> cut.grp2 <- cut(x$grp2, 3) >> >> Thus, for each combination of levels in cut.grp1 and cut.grp2, I'd >> like to obtain the distribution of levels obs. I know I can loop over >> each pair of levels in cut.grp1 and cut.grp2, but is there a more >> elegant way to achieve this? >> >> -- >> Rajarshi Guha >> NIH Chemical Genomics Center >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Bert Gunter > Genentech Nonclinical Biostatistics > 467-7374 > http://devo.gene.com/groups/devo/depts/ncb/home.shtml >-- Rajarshi Guha NIH Chemical Genomics Center