I can think of several ways to blunt force hard code what I want but I imagine there is a command or two that can be easily combined to do this: I have a data frame with about 23000 observations. There first variable is the group to which the observation belongs (about 500 different groups). The second variable is a response for each observation that is a 1,2,3,4 or 5. I want to be able to calculate the percentage of each group that choose each response. For example I want to know what percentage of group 1 (which may have a value of 34456) choose response 1 and so on. Here is some code I wrote that generates a data frame like the one I have. pop <- matrix(1:100000) groupIDs <- sample(pop,500) groupVar <- sample(groupIDs,23000,replace=TRUE) responseVar <- sample(1:5,23000,replace=TRUE) example.data <- data.frame(groupVar,responseVar) Is there a fast way to calculate these percentages beyond writing loops to manually count the responses for each of the groups? Thanks, EG [[alternative HTML version deleted]]
on 05/23/2008 09:51 AM Economics Guy wrote:> I can think of several ways to blunt force hard code what I want but I > imagine there is a command or two that can be easily combined to do this: > > I have a data frame with about 23000 observations. There first variable is > the group to which the observation belongs (about 500 different groups). The > second variable is a response for each observation that is a 1,2,3,4 or 5. I > want to be able to calculate the percentage of each group that choose each > response. For example I want to know what percentage of group 1 (which may > have a value of 34456) choose response 1 and so on. > > Here is some code I wrote that generates a data frame like the one I have. > > pop <- matrix(1:100000) > groupIDs <- sample(pop,500) > groupVar <- sample(groupIDs,23000,replace=TRUE) > responseVar <- sample(1:5,23000,replace=TRUE) > > example.data <- data.frame(groupVar,responseVar) > > Is there a fast way to calculate these percentages beyond writing loops to > manually count the responses for each of the groups? > > Thanks, > > EGUsing: table(example.data) will give you a cross tabulation of the counts of your ResponseVar by each groupVar. prop.table(table(example.data), 1) will give you a row-wise proportion (0 - 1) of the counts of ResponseVar for each groupVar. If you want percentages (0 - 100): prop.table(table(example.data), 1) * 100 See ?table and ?prop.table for more information. HTH, Marc Schwartz
tapply(example.data$responseVar,example.data$groupVar,function(x){prop.t able(table(x))}) Michael Conklin Chief Methodologist - Advanced Analytics MarketTools, Inc. 6465 Wayzata Blvd. Suite 170 Minneapolis, MN 55426 Tel: 952.417.4719 | Mobile:612.201.8978 Michael.Conklin at markettools.com MarketTools(r) http://www.markettools.com This e-mail and any attachments may contain privileged, confidential or proprietary information. If you are not the intended recipient, be aware that any review, copying, or distribution of this e-mail or any attachment is strictly prohibited. If you have received this e-mail in error, please return it to the sender immediately, and permanently delete the original and any copies from your system. Thank you for your cooperation. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Economics Guy Sent: Friday, May 23, 2008 9:52 AM To: r-help at stat.math.ethz.ch Subject: [R] Percentages for categorical data by group I can think of several ways to blunt force hard code what I want but I imagine there is a command or two that can be easily combined to do this: I have a data frame with about 23000 observations. There first variable is the group to which the observation belongs (about 500 different groups). The second variable is a response for each observation that is a 1,2,3,4 or 5. I want to be able to calculate the percentage of each group that choose each response. For example I want to know what percentage of group 1 (which may have a value of 34456) choose response 1 and so on. Here is some code I wrote that generates a data frame like the one I have. pop <- matrix(1:100000) groupIDs <- sample(pop,500) groupVar <- sample(groupIDs,23000,replace=TRUE) responseVar <- sample(1:5,23000,replace=TRUE) example.data <- data.frame(groupVar,responseVar) Is there a fast way to calculate these percentages beyond writing loops to manually count the responses for each of the groups? Thanks, EG [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I appreciate all the help. The trouble is that in my real data set each group does not always have an observation that choose each response. This results in some of the "rows" returned from prop.table() to be shorter than others so I get: Warning message: In function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 8) Is there a way to tell rbind() or do.call() to treat missing values as zero or make prop.table() include the zero proportions? On Fri, May 23, 2008 at 1:59 PM, Phil Spector <spector@stat.berkeley.edu> wrote:> EG - > Thanks for the reproducible example! > > When I run your code, and check the class of the result from tapply(), I > see that it is an > "array", and using dim(), I see it's an array > of length 500. How big is each element? > > table(sapply(res,length)) >> > > 5 > 500 > > So each piece is the same length. That means we could > make a 500x5 matrix as follows: > > do.call(rbind,res) > - Phil Spector > Statistical Computing Facility > Department of Statistics > UC Berkeley > spector@stat.berkeley.edu > > > > > > >[[alternative HTML version deleted]]