Stefan Björk
2009-Jan-22 11:17 UTC
[R] Frequency and summary statistics table with different variables and categories
Hello helpers, This is probably quite simple, but I'm stuck. I want to create a summary statistics table with frequencies and summary statistics for a large number of variables. The problem here is that (1) there are two different classes of categories (sex, type of substance abuse and type of treatent) which overlap, (2) the data for different variables should be presented in different ways -- sometimes with relative frequencies, other times with mean values. The table would finally look something like: All Male Female Alcohol Drug ... Age (mean) (mean) ... Sex (% male) (freq) (freq) ... Alcohol CS (mean) (mean) ... ... ... Data is in a data frame with quite a lot of columns (variables) and each row represents a single case. I have found out that part of this can be done with tapply, for example tapply(age, sex, mean) and join it with tapply(age, abuse, mean). But how to do with frequencies? Or is there an even simpler way? /S [[alternative HTML version deleted]]
ronggui
2009-Jan-22 11:56 UTC
[R] Frequency and summary statistics table with different variables and categories
Since %male is basically the mean if you code male=1 and female=0, which is more informative than absolute frequency. So, you may want to have a glance at doBy package, especially the summaryBy function. All the best On Thu, Jan 22, 2009 at 7:17 PM, Stefan Bj?rk <stefan.bjork at gmail.com> wrote:> Hello helpers, > > This is probably quite simple, but I'm stuck. > > I want to create a summary statistics table with frequencies and summary > statistics for a large number of variables. The problem here is that (1) > there are two different classes of categories (sex, type of substance abuse > and type of treatent) which overlap, (2) the data for different variables > should be presented in different ways -- sometimes with relative > frequencies, other times with mean values. > > The table would finally look something like: > > All Male Female Alcohol Drug ... > Age (mean) (mean) ... > Sex (% male) (freq) (freq) ... > Alcohol CS (mean) (mean) ... > ... ... > > Data is in a data frame with quite a lot of columns (variables) and each row > represents a single case. > > I have found out that part of this can be done with tapply, for example > tapply(age, sex, mean) and join it with tapply(age, abuse, mean). But how to > do with frequencies? Or is there an even simpler way? > > /S > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- HUANG Ronggui, Wincent Tel: (00852) 3442 3832 PhD Candidate Dept of Public and Social Administration City University of Hong Kong Homepage: http://ronggui.huang.googlepages.com/
David Winsemius
2009-Jan-22 13:32 UTC
[R] Frequency and summary statistics table with different variables and categories
One of the various tabulation functions would seem to be the most appropriate for getting frequency summaries: ?table ?xtabs On Jan 22, 2009, at 6:17 AM, Stefan Bj?rk wrote:> Hello helpers, > > This is probably quite simple, but I'm stuck. > > I want to create a summary statistics table with frequencies and > summary > statistics for a large number of variables. The problem here is that > (1) > there are two different classes of categories (sex, type of > substance abuse > and type of treatent) which overlap, (2) the data for different > variables > should be presented in different ways -- sometimes with relative > frequencies, other times with mean values. > > The table would finally look something like: > > All Male Female Alcohol Drug ... > Age (mean) (mean) ... > Sex (% male) (freq) (freq) ... > Alcohol CS (mean) (mean) ... > ... ... > > Data is in a data frame with quite a lot of columns (variables) and > each row > represents a single case. > > I have found out that part of this can be done with tapply, for > example > tapply(age, sex, mean) and join it with tapply(age, abuse, mean). > But how to > do with frequencies? Or is there an even simpler way? > > /S > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.